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Abstract 

We develop a full theory for the new class of Optimal Entropy-Transport problems 
between nonnegative and hnite Radon measures in general topological spaces. 

They arise quite naturally by relaxing the marginal constraints typical of Opti¬ 
mal Transport problems: given a couple of finite measures (with possibly different 
total mass), one looks for minimizers of the sum of a linear transport functional 
and two convex entropy functionals, that quantify in some way the deviation of the 
marginals of the transport plan from the assigned measures. 

As a powerful application of this theory, we study the particular case of Loga¬ 
rithmic Entropy-Transport problems and introduce the new Hellinger-Kantorovich 
distance between measures in metric spaces. 

The striking connection between these two seemingly far topics allows for a 
deep analysis of the geometric properties of the new geodesic distance, which lies 
somehow between the well-known Hellinger-Kakutani and Kantorovich-Wasserstein 
distances. 
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1 Introduction 


The aim of the present paper is twofold: In Part I we develop a full theory of the new class 
of Optimal Entropy-Transport problems between nonnegative and hnite Radon measures 
in general topological spaces. As a powerful application of this theory, in Part II we 
study the particular case of Logarithmic Entropy-Transport problems and introduce the 
new Hellinger-Kantorovich (l-K) distance between measures in metric spaces. The striking 
connection between these two seemingly far topics is our main focus, and it paves the way 
for a beautiful and deep analysis of the geometric properties of the geodesic hK distance, 
which (as our proposed name suggests) can be understood as an inf-convolution of the 
well-known Hellinger-Kakutani and the Kantorovich-Wasserstein distances. In fact, our 
approach to the theory was opposite: in trying to characterize hK, we were hrst led to the 
Logarithmic Entropy-Transport problem, see Section lAl 

Prom Transport to Entropy-Transport problems. In the classical Kantorovich 
formulation. Optimal Transport problems [371 sni El HZ] deal with minimization of a 
linear cost functional 



c(a;i,a;2) d7(a;i,a;2), c : Xi x X2-)■ M, 


( 1 . 1 ) 


among all the transport plans, i.e. probability measures in 7 {Xi x X2), 7 whose marginals 
/ij = 7rj7 G T(A'j) are prescribed. Typically, Xi,X2 are Polish spaces, /ij are given Borel 
measures (but the case of Radon measures in Hausdorff topological spaces has also been 
considered, see [231 ETj), the cost function c is a lower semicontinuous (or even Borel) 
function, possibly assuming the value -|-cxo, and 7 i^{xi,X 2 ) = Xi are the projections on the 
Tth coordinate, so that 

7rj7 =/ii /ii(Ai) = 7 i(AixX 2), /i2(A2) = 7 i(XixA 2) for every A e (1.2) 

Starting from the pioneering work of Kantorovich, an impressive theory has been devel¬ 
oped in the last two decades: from one side, typical intrinsic questions of linear pro¬ 
gramming problems concerning duality, optimality, uniqueness and structural properties 
of optimal transport plans have been addressed and fully analyzed. In a parallel way, this 
rich general theory has been applied to many challenging problems in a variety of helds 
(probability and statistics, functional analysis, PDEs, Riemannian geometry, nonsmooth 
analysis in metric spaces, just to mention a few of them: since it is impossible here to 
give an even partial account of the main contributions, we refer to the books |m I3U] for 
a more detailed overview and a complete list of references). 

The class of Entropy-Transport problems, we are going to study, arises quite 
naturally if one tries to relax the marginal constraints 7rj7 = pi by introducing suitable 
penalizing functionals ^i, that quantify in some way the deviation from pi of the marginals 
7j := 7rj7 of 7. In this paper we consider the general case of integral functionals (also 
called Csiszdr f -divergences [U]) of the form 
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where Fi : [0, +cxo) —)■ [0, +cxo] are given convex entropy fnnctions, like for the logarithmic 
or power-like entropies 


Uq{s) := s — 1 — logs, Ui{s) := slogs — s -f- 1, 


(1.4) 


or for the total variation fnnctional corresponding to the nonsmooth entropy 14(s) ; = 
I s — 11, considered in [35]. 

Notice that the presence of the singnlar part in the Lebesgue decomposition of y* 
in fll.3p does not force Fi{s) to be superlinear as s t +oo and allows for all the exponents 
p in fll.dp . 

Once a specihc choice of entropies Fi and of hnite nonnegative Radon measnres /ij G 
M(Xj) is given, the Entropy-Transport problem can be formnlated as 


ET(/ii,/i 2 ) := inf 

where S' is the convex fnnctional 


15 /^ 2 ) : y G M(Xi X X 2 ) (, 


(1.5) 


<^(7lhi,h2) := ^i(yilhi) + ^2(y2|h2) + / c(a:i,X2) dy. 

JXixX2 


( 1 . 6 ) 


Notice that the entropic formnlation allows for measnres pi, p 2 and y with possibly dif¬ 
ferent total mass. 

The flexibility in the choice of the entropy fnnctions Fi (which may also take the valne 
-|-oo) covers a wide spectrnm of sitnations (see Section 13.31 for various examples) and in 
particular guarantees that fll.5p is a real generalization of the classical optimal transport 
problem, which can be recovered as a particular case of fll.bp when Fi{s) is the indicator 
function of {1} (i.e. Fi{s) always takes the value -|-cxo with the only exception of s = 1, 
where it vanishes). 

Since we think that the structure fll.6p of Entropy-Transport problems will lead to new 
and interesting models and applications, we have tried to establish their basic theory in 
the greatest generality, by pursuing the same line of development of Transport problems: 
in particular we will obtain general results concerning existence, duality and optimality 
conditions. 

Considering e.g. the Logarithmic Entropy case, where Fi{s) = slogs — (s — 1), the 

dual formulation of fll.5p is given by 


D(/ii,/i 2 ) := sup |^(<^i,<^ 2 |hi,h 2 ) : M, </?i(a:i) + <^ 2 ( 3 ^ 2 ) < c(a;i, 0 : 2 )|, 


where ^(V9i,(p2|/ii,/U2) := / (l 

Jxi 


^-‘Pi 


) d/ii + / (1 

JX 2 




) dh2, 


(1.7) 


where one can immediately recognize the same convex constraint of Transport problems: 
the couple of dual potentials pi should satisfy < c on Xi xX 2 - The main difference 

is due to the concavity of the objective functional 


(</?i,<yC 2 )H- / (l-e ‘^ 1 ) d/ii-h / (l-e d/i 2 , 

Jxi J X2 


4 










whose form can be explicitly calculated in terms of the Lagrangian conjugates F* of the 
entropy functions. The change of variables rfji := 1—e"*^* transforms fll.71) in the equivalent 
problem of maximizing the linear functional 

('01,'02) ^ / V'id/ii+ / V’2d/i2 (1-8) 

i JXl Jx2 

on the more complicated convex set 

■■'ipi-Xi^ {-00,1), (1 - ^/^i(xi))(l - ?/; 2 (x 2 )) > (1.9) 

We will calculate the dual problem for every choice of Fi and show that its value always 
coincide with ET(pi,/i 2 ). The dual problem also provides optimality conditions, that 
involve the couple of potentials {ipi,ip 2 ), the support of the optimal plan 7 and the 
densities a, of its marginals 7 * w.r.t. pi. For the Logarithmic Entropy Transport problem 
above, they read as 


CTi > 0 , ipi = - logcTi Pi a.e. in W, 

7^1 © </?2 < c in X X 2 , (pi®(p 2 = c 7 -a.e. in Xi x X 2 , 

and they are necessary and sufficient for optimality. 

The study of optimality conditions reveals a different behavior between pure transport 
problems and the other entropic ones. In particular, the c-cyclical monotonicity of the 
optimal plan 7 (which is still satished in the entropic case) does not play a crucial role in 
the construction of the potentials (pi. When F)(0) are hnite (as in the logarithmic case) 
it is possible to obtain a general existence result of (generalized) optimal potentials even 
when c takes the value +cxd. 

A crucial feature of Entropy-Transport problems (which is not shared by the pure 
transport ones) concerns a third “homogeneous” formulation, which exhibits new 
and unexpected properties. It is related to the 1-homogeneous Marginal Perspective 
function 

H{xi,ri;x 2 ,r 2 ) := inf (riFi{e/ri) + r2F2{9/r2) + 9c{xi,X2)) ( 1 . 11 ) 

and to the corresponding integral functional 

^(hi,h2|7);= [ H{xi,gi{xi);x2,g2{x2))d'y + '^Fi{0)y^{Xi), p-= ( 1 . 12 ) 

JXiXX2 i Ct7i 

where /i* = gi'ji -|- yf is the “reverse” Lebesgue decomposition of pi w.r.t. the marginals 
'ji of 7 . We will prove that 

ET(/ii,/i 2 ) = min |^(pi,/i 2 | 7 ) : 7 ^ M(Xi x X 2 )| (1.13) 

with a precise relation between optimal plans. In the Logarithmic Entropy case Ffs) = 
slogs — (s — 1) the marginal perspective function H takes the particular form 

H{xi,ri,X2,r2) =ri+r2- , (1-14) 
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which will be the starting point for understanding the deep connection with the Hellinger- 
Kantorovich distance. Notice that in the case when Xi = X 2 and c is the singular cost 

c(xi,a;2) ;= 1*^ (1.15) 

l+cxo otherwise, 

fll.lSp provides an equivalent formulation of the Hellinger-Kakutani distance [20l[22], see 
also Example E.5 in Section 13.31 

Other choices, still in the simple class fll.dp . give raise to “transport” versions of well 
known functionals (see e.g. [28] for a systematic presentation): starting from the reversed 
entropies Fi{s) = s — 1 — log s one gets 

H{xi,ri,X 2 ,r 2 ) = rilogri + r 2 logr 2 - (n + r 2 ) log ( 2 + c('x^x ) )’ 

which in the extreme case fll.lSp reduces to the Jensen-Shannon divergence [29], a squared 
distance between measures derived from the celebrated Kullback-Leibler divergence [25]. 
The quadratic entropy Fj(s) = l(s — 1)^ produces 

H{xi, ri]X 2 , r 2 ) = ~ h{<^{xi,X 2 ))rir 2 ^ , (1.17) 

where h{c) = c(4 — c) if 0 < c < 2 and 4 if c > 2: Equation fll.l7p can be seen as the 
transport variant of the triangular discrimination (also called symmetric X^-measure), 
based on the Pearson X^-divergence, and still obtained by fll.l2p when c has the form 

dnsp. 

Also nonsmooth cases, as for l/(s) = |s — 1| associated to the total variation distance 
(or nonsymmetric choices of T)) can be covered by the general theory. In the case of 
Fi{s) = E(s) the marginal perspective function is 


H{xi,ri]X2,r2) = ri + r2 - (2 - c(a;i, a;2)) + (ri A rs) = |r2 -ri| + {c{xi,X2) A 2)(ri Ars); 

when Xi = X 2 = with c(xi,a: 2 ) := \xi — X 2 I we recover the generalized Wasser- 
stein distance Wi’^ introduced and studied by [55]; it provides an equivalent variational 
characterization of the flat metric j36j . 

However, because of our original motivation (see Section lAl), Part II will focus on the 
case of the logarithmic entropy Fi = Ui, where H is given by fll.Mp . We will exploit its 
relevant geometric applications, reserving the other examples for future investigations. 

From the Kantorovich-Wasserstein distance to the Hellinger-Kantorovich dis¬ 
tance. From the analytic-geometric point of view, one of the most interesting cases of 
transport problems occurs when Xi = X 2 = X coincide and the cost functional ^ is 
induced by a distance d on X: in the quadratic case, the minimum value of fll.ip for given 
measures /ii,/i 2 in the space T 2 (l^) of probability measures with finite quadratic moment 
dehnes the so called L^-Kantorovich-Wasserstein distance 

Wd(pi,/i 2 ) := inf I J d‘^{xi,X 2 )dj{xi,X 2 ) : J e^iX X X), 7rj7 =/rjj, (1.18) 
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which metrizes the weak convergence (with qnadratic moments) of probability measnres. 
The metric space (IP 2 (^), Wd) inherits many geometric features from the underlying {X, d) 
(as separability, completeness, length and geodesic properties, positive curvature in the 
Alexandrov sense, see 0 )- Its dynamic characterization in terms of the continuity equa¬ 
tion [7] and its dual formulation in terms of the Hopf-Lax formula and the corresponding 
(sub-)solutions of the Hamilton-Jacobi equation [31] lie at the core of the applications to 
gradient flows and partial differential equations of diffusion type [2]. Finally, the behav¬ 
ior of entropy functionals as fll.3p along geodesics in {‘? 2 {X),\Nd,) [3^ 131] fll] encodes a 
valuable geometric information, with relevant applications to Riemannian geometry and 
to the recent theory of metric-measure spaces with Ricci curvature bounded from below 

m SSI I3I1131I11 Ellin]. 

It has been a challenging question to hnd a corresponding distance (enjoying analogous 
deep geometric properties) between hnite positive Borel measures with arbitrary mass in 
M(X). In the present paper we will show that by choosing the particular cost function 


c{Xi,X2) 


e{d{xi,X2)), 


where 


£(d) : = 


— log (cos^(d)) 
-|-oo 


if d < 7r/2, 
otherwise. 


(1.19) 


the corresponding Logarithmic-Entropy Transport problem 

LEr(/ii,/i 2 ) := min / (a* log (7* - di-h l) d/r*-h / £(d(xi, 0 : 2 )) d 7 , o-i = 

7eM(X) ^Jx Jx^ d/ii 

( 1 . 20 ) 

coincides with a (squared) distance in M(X) (which we will call Hellinger-Kantorovich 
distance and denote by hK) that can play the same fundamental role like the Kantorovich- 
Wasserstein distance for ‘? 2 {^)- 

Here is a schematic list of our main results: 


(i) The representation fll.131) based on the Marginal Perspective function fll.141) yields 

dfij 

dy* 



LEr(/ii,/i 2 ) = min I / -F p 2 - 2 piP 2 cos(d(a;i, 0 ) 2 ) A 7 r/ 2 ) j dy : p* 


(ii) By performing the rescaling 7 1 —)■ rf we realize that the function iL(a;i, 0 ) 2 ,r^) is 
strictly related to the squared (semi)-distance 

d^(a:i,ri;a; 2 ,r 2 ) := rl + rj - 2 rir 2 cos(d(a;i, 0 : 2 ) A tt), (a:*, 7 ) e X x M+ (1-22) 

which is the so-called cone distance in the metric cone € over X, cf. [9]. The latter 
is the quotient space of X x M+ obtained by collapsing all the points {x, 0), x G X, 
in a single point 0 , called the vertex of the cone. We introduce the notion of “2- 
homogeneous marginal” 

^ = \)‘^a := j C{x)dfi = J C{x)r'^ da{x,r) for exeij ( E Cb{X), 

(1.23) 

to “project” measures a G M((t) on measures /i G M(X). Conversely, there are 
many ways to “lift” a measure fi G M(X) to a G M(£) (e.g. by taking a := /a(8)(5i). 
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The Hellinger-Kantorovich distance hK(/ii, ^2) can then be dehned by taking the best 
Kantorovich-Wasserstein distance between all the possible lifts of fii, ^2 in T 2 (^), 

i.e. 

H<(hi,h 2 ) = niin |Wdj.(ai,a 2 ) : a* e T 2 (<i), =/Wij. (1.24) 

It turns out that (the square of) 01.2411 yields an equivalent variational representation 
of the LET functional. In particular, 01.24p shows that in the case of concentrated 
measures 

LEr(ai(5a;i, 02(5x2) = l-K^(ai(5xi, 02 ( 5 ^ 2 ) = cl|(a;i, op X 2 , 02 ). (1.25) 

Notice that 01.2411 resembles the very dehnition 01.18P of the Kantorovich-Wasserstein 
distance, where now the role of the marginals ttJ is replaced by the homogeneous 
marginals 1)^. It is a nontrivial part of the equivalence statement to check that the 
difference between the cut-off thresholds (7r/2 in 01.2111 and tt in 01.2211 does not 
affect the identity LET = EK^. 

By rehning the representation formula 01.2411 by a suitable rescaling and gluing tech¬ 
nique we can prove that (M(X), EK) is a geodesic metric space, a property that it 
is absolutely not obvious from the UET-representation and depends on a subtle in¬ 
terplay of the entropy functions T)((t) = a log a — a + 1 and the cost function c 
from 01.1911 . We show that the metric induces the weak convergence of measures in 
duality with bounded and continuous functions, thus it is topologically equivalent 
to the flat or Bounded Lipschitz distance [IZl Sec. 11.3], see also [211 Thm. 3]. It 
also inherits the separability, completeness, length and geodesic properties from the 
correspondent ones of the underlying space (W, d). On top of that, we will prove 
a precise superposition principle (in the same spirit of the Kantorovich-Wasserstein 
one [21 Sect.8], [in]) for general absolutely continuous curves in (M(W), EK) in terms 
of dynamic plans in (t: as a byproduct, we can give a precise characterization of abso¬ 
lutely continuous curves and geodesics as homogeneous marginals of corresponding 
curves in {72{€),\Nd^). An interesting consequence of these results concerns the 
lower curvature bound of (M(X), EK) in the sense of Alexandrov: it is a positively 
curved space if and only if {X, d) is a geodesic space with curvature > 1. 

The dual formulation of the LET problem provides a dual characterization of EK, viz. 

^EK^(/ii,/i 2 ) = sup I j ^i^dfX2- j ^d/ii : ^ G Lip,,(X), inf^>-l/2|, (1.26) 
where {^t)o<t<i is given by the inf-convolution 

taf «"') ' + )"■"')) = inf i(i - 

x'ex 12t^{x') 2-E4t^(x') x'ex t\ 1 + 2t^{x') / 

By exploiting the Hopf-Lax representation formula for the Hamilton-Jacobi equation 
in €, we will show that for arbitrary initial data ^ G Lip^(X) with inf ^ > —1/2 the 
function := is a subsolution (a solution, if {X, d) is a length space) of 

< 0 pointwise in X x (0,1). 





















If {X, d) is a length space we thus obtain the characterization 

^|-K^(po,hi) = supj [ /"^odpo :e C^([0,l];Lip6(X)), 

^ Jo (1-27) 

dt^t{x) + + 24 ^(x) <0 in X x ( 0 , 1 )|, 

which reproduces, at the level of hK, the nice link between Wd and Hamilton-Jacobi 
equations. One of the direct applications of fll.27p is a sharp contraction property 
w.r.t. hK for the Heat flow in RCD(0, oo) metric measure spaces (and therefore in 
every Riemannian manifold with nonnegative Ricci curvature). 

(vi) fll.27p clarihes that the hK distance can be interpreted as a sort of inf-convolution 
between the Hellinger (in duality with solutions to the ODE dt^ -h 2^^ = 0) and the 
Kantorovich-Wasserstein distance (in duality with (sub-)solutions to 

+ ||Dx'CtP(3^) < 0). The Hellinger distance 

hlelP(pi,/i2) = / (v^-\/^)^d7, fii = gi'y, 

Jx 

corresponds to the hK functional generated by the discrete distance (d(a;i, X 2 ) = x/2 
if Xi 7 ^ X 2 )- We will prove that 

hK(/ri,p2) < Hell(pi,p2), hK(/ii,/i2) < Wd(/ii,/r2), 
hK„d(hi,h 2 ) t Hell(pi,/i 2 ), rihKd/n t Wd(/ii,p 2 ) as n t 00 , 

where hK„d (resp. hKd/n) is the hK distance induced by nd (resp. d/n). 

(vii) Combining the superposition principle and the duality with Hamilton-Jacobi equa¬ 
tions, we eventually prove that hK admits an equivalent dynamic characterization 
“a la Benamou-Brenier” u\m (see also the recent [ 21 ]) in X = 

hK^(po,hi) = min I f ! ("intp-h d/ij dt :/i G C([0,1]; M(M'^)), 

'-Jo i ^ 4 J (-X^S) 

= hi, + V-(nt/ii) = wtfit in ^'{W^ x ( 0 , 1 ))|. 

Moreover, for the length space X = a curve [0,1] 3 t fi(t) is geodesic curve 
w.r.t. hK if and only if the coupled system 

^ ~ ^ (1.29) 

holds for a suitable solution The representation fll.28p is the starting 

point for further investigations and examples, which we have collected in |27j . 

It is not superfluous to recall that the hK variational problem is just one example in the 
realm of Entropy-Transport problems and we think that other interesting applications can 
arise by different choices of entropies and cost. One of the simplest variation is to choose 
the (seemingly more natural) quadratic cost function c(a;i,a; 2 ) := d^(a:i,a; 2 ) instead of the 
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more “exotic” fll.l9p . The resulting functional is still associated to a distance expressed 
by 

GI-K^(/ii,/i2) := min I y - 2rir2 exp(-d^(xi, a;2)/2) j dQ:| (1.30) 

where the minimum runs among all the plans a G M(£ x £) such that f)^7rjQ: = /ij (we 
propose the name “Gaussian Hellinger-Kantorovich distance”). If (^, d) is a complete, 
separable and length metric space, (M(X), GhK) is a complete and separable metric space, 
inducing the weak topology as hK. However, it is not a length space in general, and we 
will show that the length distance generated by GhK is precisely hK. 

The plan of the paper is as follows. 

Part I develops the general theory of Optimal Entropy-Transport problems. Section [2] 
collects some preliminary material, in particular concerning the measure-theoretic setting 
in arbitrary Hausdorff topological spaces (here we follow [H]) and entropy functionals. 
We devote some effort to deal with general functionals (allowing a singular part in the 
dehnition fll.3p l in order to include entropies which may have only linear growth. The 
extension to this general framework of the duality theorem 12.71 (well known in Polish 
topologies) requires some care and the use of lower semicontinuous test functions instead 
of continuous ones. 

Section [3] introduces the class of Entropy-Transport problems, discussing same exam¬ 
ples and proving a general existence result for optimal plans. The “reverse” formulation 
of Theorem 13.111 though simple, justihes the importance to deal with the largest class of 
entropies and will play a crucial role in Section 0 

Section 0] is devoted to hnd the dual formulation, to prove its equivalence with the 
primal problem (cf. Theorem 14.lip , to derive sharp optimality conditions (cf. Theorem 
14.6p and to prove the existence of optimal potentials in a suitable generalized sense (cf. 
Theorem I4.15p . The particular class of “regular” problems (where the results are richer) 
is also studied with some details. 

Section 0] introduces the third formulation fll.l2p based on the marginal perspective 
function fll.lll) and its “homogeneous” version fSection 15.21) . The proof of the equivalence 
with the previous formulations is presented in Theorem 15.51 and Theorem 15.81 This part 
provides the crucial link for the further development in the cone setting. 

Part II is devoted to Logarithmic Entropy-Transport (LET) problems (Section [6]) and 
to their applications to the Hellinger-Kantorovich distance H< on M(X). 

The Hellinger-Kantorovich distance is introduced by the lifting technique in the cone 
space in Section [3, where we try to follow a presentation modeled on the standard one 
for the Kantorovich-Wasserstein distance, independently from the results on the LET- 
problems. After a brief recap on the cone geometry (Section 17.11) we discuss in some 
detail the crucial notion of homogeneous marginals in Section 17.21 and the useful tightness 
conditions (Lemma I7.3p for plans with prescribed homogeneous marginals. Section 17.31 
introduces the dehnition of the hK distance and its basic properties. The crucial rescaling 
and gluing techniques are discussed in Section 17.41 they lie at the core of the main metric 
properties of hK, leading to the proof of the triangle inequality and to the characterizations 
of various metric and topological properties in Section 17.51 The equivalence with the LET 
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formulation is the main achievement of Section [72] iTheorem I7.20p . with applications to 
the duality formula iTheorem 17.211) . to the comparisons with the classical Hellinger and 
Kantorovich distances (Section [TT]) and with the Gaussian Hellinger-Kantorovich distance 
fSection 17.Sp . 

The last Section of the paper collects various important properties of hK, that share a 
common “dynamic” flavor. After a preliminary discussion of absolutely continuous curves 
and geodesics in the cone space £ in Section l8Tl we derive the basic superposition principle 
in Theorem 18.41 This is the cornerstone to obtain a precise characterization of geodesics 
(Theorem [82]), a sharp lower curvature bound in the Alexandrov sense iTheorem 18.81) 
and to prove the dynamic characterization a la Benamou-Brenier of Section 18.51 The 
other powerful tool is provided by the duality with subsolutions to the Hamilton-Jacobi 
equation fTheorem 18.121) . which we derive after a preliminary characterization of metric 
slopes for a suitable class of test functions in One of the most striking results of Section 
18.4! is the explicit representation formula for solutions to the Hamilton-Jacobi equation in 
X, that we obtain by a careful reduction technique from the Hopf-Lax formula in €. In 
this respect, we think that Theorem 18. Ill is interesting by itself and could hnd important 
applications in different contexts. From the point of view of Entropy-Transport problems, 
Theorem 18. Ill is particularly relevant since it provides a dynamic interpretation of the dual 
characterization of the LET functional. In Section 18.61 we show that in the Euclidean case 
JT = all geodesic curves are characterized by the system fll.291) . The last Section [H2I 
provides various contraction results: in particular we extend the well known contraction 
property of the Heat flow in spaces with nonnegative Riemannian Ricci curvature to HK. 

Note during final preparation. The earliest parts of the work developed here were 
hrst presented at the ERG Workshop on Optimal Transportation and Applications in 
Pisa in 2012. Since then the authors developed the theory continuously further and 
presented results at different workshops and seminars, see Appendix |A| for some remarks 
concerning the chronological development of our theory. In June 2015 they became aware 
of the parallel work [21] , which mainly concerns the dynamical approach to the Hellinger- 
Kantorovich distance discussed in Section 18.51 and the metric-topological properties of 
Section [721 in the Euclidean case. Moreover, in mid August 2015 we became aware of the 
work mn which starts from the dynamical formulation of the Hellinger-Kantorovich 
distance in the Euclidean case, prove existence of geodesics and sufficient optimality and 
uniqueness conditions (which we state in a stronger form in Section 18.Op with a precise 
characterization in the case of a couple of Dirac masses, provide a detailed discussion 
of curvature properties following Otto’s formalism [33], and study more general dynamic 
costs on the cone space with their equivalent primal and dual static formulation (leading 
to characterizations analogous to fIT.ip and fl6.14p in the Hellinger-Kantorovich case). 

Apart from the few above remarks, these independent works did not influence the hrst 
(cf. arXivl508.07941vl) and the present version of this manuscript, which is essentially a 
minor modihcation and correction of the hrst version. In the hnal Appendix we give a 
brief account of the chronological development of our theory. 

Main notation 

M(X) hnite positive Radon measures on a Hausdorh topological space X 

J’(X), 72(X) Radon probability measures on X (with hnite quadratic moment) 
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Tj/i 

7 = yU = ^>7+7“^ 

Ch{X) 

Lip,(X), Lip,,(X) 
LSCb(X),LSa(X) 
USCb(X),USa(X) 
B(X),Bb(X) 

LP{X,fi), LP{X,fi;R<^) 

m+) 

F{s),F,{s) 

F*{<P),F*{<P) 

F°{^),F°{ipi) 

R{r),Ri{ri) 

Hc{ri,r2), H{xi,ri;x2,r2) 
c{xi,X2) 

L£r(/ii,/i2), iid) 

Wd(/il,/i2) 

|-K(/il,yU2) 

GH<(p,i, /i2) 

(C, dc), 0 
(t[r] 

hi dile,2(-) 

AC"([0,1];X) 

|x'|d 

|Dz/|, \Bzf\a 


Borel subsets of X 

push forward of /i G M(X) by a map T : X ^Y: 02.51) 

Lebesgue decompositions of 7 and /r, Lemma 1X31 

continuous and bounded real functions on X 

bounded (with bounded support) Lipschitz real functions on X 

lower semicontinuous and bounded (or simple) real functions on X 

upper semicontinuous and bounded (or simple) real functions on X 

Borel (resp. bounded Borel) real functions 

Borel /i-integrable real (or MAyalued) functions 

set of admissible entropy functions, see 02.131) . 02.14p . 

admissible entropy functions. 

Legendre transform of F, Fi, see 02.17p . 

concave conjugate of an entropy function, see 02.431) . 

reversed entropies, see 02.281) . 

marginal perspective function, see 05.11) . 05.91) . 05.3p 

lower semicontinuous cost function defined in X = Xi x X 2 - 

entropy functionals and their reverse form, see 02.34p and 02.551) 

general Entropy-Transport functional and its minimum, see 03.4p 

dual functional and its supremum, see 04.101) and 04.81) 

set of admissible Entropy-Kantorovich potentials 

Logarithmic Entropy Transport functional and its cost: Section (HU] 

Kantorovich-Wasserstein distance in 72{X) 

Hellinger-Kantorovich distance in M(X): Section 17731 

Gaussian Hellinger-Kantorovich distance in M(X): Section fT^ 

metric cone and its vertex, see Section ITT] 

ball of radius r centered at 0 in € 

homogeneous marginals and dilations, see 07.151) . 07.16P 

plans in C X £ with constrained homogeneous marginals, see 07.20p 

space of curves x : [0,1] —)■ X with p-integrable metric speed 

metric speed of a curve x G AC ([a, b]; (X, d)). Sect. 18.11 

metric slope and asymptotic Lipschitz constant in Z, see 08.34p 


Part I. Optimal Entropy-Transport problems 

2 Preliminaries 

2.1 Measure theoretic notation 

Positive Radon measures, narrow and weak convergence, tightness. Let (X, r) 

be a Hausdorff topological space. We will denote by 25 (X) the cx-algebra of its Borel sets 
and by M(X) the set of finite nonnegative Radon measures on X [H], i.e. a-additive set 
functions fi : 23(X) [0, 00 ) snch that 

'i B E 23(X), Ve>0 3K^ C B compact snch that fi{B \ K^) < e. (2.1) 
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Radon measures have strong continuity property with respect to monotone convergence. 
For this, denote by LSC(X) the space of all lower semicontinuous real-valued functions 
on X and consider a nondecreasing directed family (/A)AeL C LSC(X) (where L is a 
possibly uncountable directed set) of nonnegative and lower semicontinuous functions f\ 
converging to /, we have (cf. [HI Prop. 5, p.42]) 

lim / 

-''SL J ^ 

We endow M(X) with the narrow topology, the coarsest (Hausdorff) topology for which 
all the maps fi ^ 99 d/i are lower semicontinuous, as (^ : X —?■ M varies among the set 

LSCb(X) of all bounded lower semicontinuous functions [HI p. 370, Def. 1]. 

Remark 2.1 (Radon versus Borel, narrow versus weak). When {X,t) is a Radon space 

(in particular a Polish, or Lusin or Souslin space m p. 122 ]) then every Borel measure 

satishes m, so that M(X) coincides with the set of all nonnegative and hnite Borel 
measures. Narrow topology is in general stronger than the standard weak topology in¬ 
duced by the duality with continuous and bounded functions of Cb{X). However, when 
(X, r) is completely regular, i.e. 

for any closed set F C X and any xq G X \ F 
there exists / G F;,(X) with /(xq) > 0 and / = 0 on F, 

(in particular when r is metrizable), narrow and weak topology coincide [HI P- 371]. 
Therefore when (X, r) is a Polish space we recover the usual setting of Borel measures 
endowed with the weak topology. □ 

A set X C M(X) is bounded if sup^g 3 ^/i(X) < 00 ; it is equally tight if 

V£>0 dXgCX compact such that /i(X \ K^) < e for every /i G X. (2.4) 

Compactness with respect to narrow topology is guaranteed by an extended version of 
Prokhorov’s Theorem m Thm. 3, p. 379]. Tightness of weakly convergent sequences in 
metrizable spaces is due to Le Cam [26] . 

Theorem 2.2. If a subset X C M(X) is bounded and equally tight then it is relatively 
compact with respect to the narrow topology. The converse is also true in the following 
cases: 

(i) (X, r) is a locally compact or a Polish space; 

(a) (X, r) is metrizable and X = {fin : n G M} for a given weakly convergent sequence 
iPn) ■ 

If /i G M(X) and Y is another Hausdorff topological space, a map T : X ^ Y is Lusin 
fi-measurable m Ch. I, Sec. 5] if for every e > 0 there exists a compact set C X 
such that fi{X \ K^) < e and the restriction of T to is continuous. We denote by 
T(j/i G M(X) the push-forward measure dehned by 

Tj/i(F) := fi{T-\B)) for every B G R(r). (2.5) 

For fi G M(X) and a Lusin /i-measurable T : X —)■ X, we have Tjj/r G M(X). The linear 
space B(X) (resp. Bb(X)) denotes the space of real Borel (resp. bounded Borel) functions. 
If /i G M(X), p G [l,cxo], we will denote by L^(X,/r) the subspace of Borel p-integrable 
functions w.r.t. fi, without identifying /i-almost equal functions. 


f\dfi = / f dfi for all/i G M(X). 


( 2 , 2 ) 
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Lebesgue decomposition. Given 7 ,/i G M(X), we write 7 /r if fi{A) = 0 yields 

'y{A) = 0 for every A G 'B{X). We say that 7 ± /r if there exists B G 'B{X) such that 

Lemma 2.3 (Lebesgue decomposition). For every 7 ,/i G M(X) (with (7 + fi){X) > 0), 
there exist Borel functions a, g : X ^ [0, 00 ) and a Borel partition (A, A^, A^) of X with 
the following properties: 

A = {x G X : a{x) > 0} = {x G X : g{x) > 0}, a ■ g = 1 in A, (2.6) 

7 = a/i + 7 ^, aGL^X,p), 7 ^ L p, 7 ^(X \ A^) =/i(7l^) = 0, (2.7) 

/i = P7 + /i^, PGL^X,7), /i^L7, p^(X\7l^) = 7 ( 71 ^) = 0. (2.8) 

Moreover, the sets A, A^, A^ and the densities a, g are uniquely determined up to (/i + 7 )- 
negligible sets. 

Proof. Let 9 G B(X; [0,1]) be the Lebesgue density of 7 w.r.t. 1 / := /i + 7. Thus, 9 is 
uniquely determined up to i/-neghgible sets. The Borel partition can be dehned by setting 
A := {x E X : 0 < 9{x) < 1}, A^ := {x E X : 9{x) = 1} and A^ ■.= {x E X ■. 9{x) = 0}. 
By dehning a := 9/{l — 9), g := 1/a = (1 — 9)/9 for every x E A and (T = p = 0inW\y4, 
we obtain Borel functions satisfying fl2.7p and fl2.8p . 

Conversely, it is not difficult to check that starting from a decomposition as in fl 2 . 6 p . 
fl2.7p . and fl2.8p and dehning 6^ = 0 in 6* = 1 in and 9 := a/{I + a) in A we obtain 
a Borel function with values in [0,1] such that 7 = 6 '(/i + 7 ). □ 

2.2 Min-max and duality 

We recall now a powerful form of von Neumann’s Theorem, concerning minimax prop¬ 
erties of convex-concave functions in convex subsets of vector spaces and refer to m 
Prop. 1.2-1-3.2, Chap. VI] for a general exposition. 

Let A, B be nonempty convex sets of some vector spaces and let us suppose that A is 
endowed with a Hausdorff topology. Let LiAxi?—)-Mbea function such that 

a i-G- L(a, h) is convex and lower semicontinuous in A for every h E B, (2.9a) 
b I—)■ L(a, b) is concave in B for every a E A. (2.9b) 

Notice that for arbitrary functions L one always has 

inf sup L{a, b) > sup inf L{a, fe); (2-10) 

asA beB b&B a&A 

SO that equality holds in fl2.10p if sup^g^ infaeA b) = -|-oo. When sup^g^ infasA 
is hnite, we can still have equality thanks to the following result. 

The statement has the advantage of involving a minimal set of topological assumptions 
(we refer to [Ml Thm. 3.1] for the proof, see also [SI Chapter 1, Prop. 1.1]). 

Theorem 2.4 (Minimax duality). Assume that fl2.9ap and fl2.9bp hold. If there exists 
b.,, E B and C > sup^g^ inf^g^ L(a, 6 ) such that 

{aEA-.L{a,b,)<C} is compact in A, ( 2 - 11 ) 

then 

inf sup L{a, h) = sup inf L{a, b). (2-12) 

asA 6 eS asA 
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2.3 Entropy functions and their conjugates 

Entropy functions in [0,oo). We say that F : [0, oo) [0, oo] belongs to the class 
r(M+) of admissible entropy function if it satishes 

F is convex and lower seniicontinuous with Doni(F) n (0, oo) 7 ^ 0, (2-13) 


where 


Doni(F) := {s > 0 : F{s) < 00 }, Sp := inf Doni(F), := supDoni(F) > 0. (2.14) 

The recession constant Ff^, the right derivative Fq at 0, and the asymptotic affine coeffi¬ 
cient affFoo are dehned by (here So G Dom(F)) 


FL : = 


F{s) _ F{s) - F{so) if ^(0) = 

S t>0 s - So ’ ° ■ I lim otherwise, 


affFoc := 


if F^ = + 00 , 


J +00 

I lim (F's — F(s)) otherwise. 

Ks^oo ^ ^ 


(2.15) 

(2.16) 


To avoid trivial cases, we assumed in fl2.13p that the proper domain Dom(F) contains at 
least a strictly positive real number. By convexity, Dom(F) is a subinterval of [0, cx)), 
and we will mainly focus on the case when Dom(F) has nonempty interior and F has 
superlinear growth, i.e. F'^ = -t-cx, but it will be useful to deal with the general class 
defined by fl2.13p . 


Legendre duality. As usual, the Legendre conjugate function F* : M ^ (—cxo, -|-x] is 
defined by 

F*(0) := sup F(s)), (2.17) 

s>0 

with proper domain Dom(F*) := {0 G M : F*{(j)) G M}. Strictly speaking, F* is the 
conjugate of the convex function F : M ^ (—cxo, -|-x], obtained by extending F to -|-x 
for negative arguments. Notice that 

inf Dom(F*) = — 00 , supDom(F*) = F^, (2.18) 

so that F* is finite and continuous in (—x,F^), nondecreasing, and satishes 

lim F*{(j)) = inf F* = —F(0), supF* = lim F*{(j)) = -|-x. (2.19) 

00 <j)t+oo 

Concerning the behavior of F* at the boundary of its proper domain we can distinguish 
a few cases depending on the behavior of F at sf and sf: 

• If Fq = —X (in particular if F(0) = -|-x) then F* is strictly increasing in Dom(F*). 

• If Fq is hnite, then F* is strictly increasing in [Fq, F^) and takes the constant value 

F(0) in (—X, Fq]. Thus F(0) belongs to the range of F* only if Fq > —x. 

• If F^ is hnite, then hm(^.|.p^ = ahFoo. Thus F^ G Dom(F*) only if ahFoo < x. 
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• The degenerate case when F'^ = Fq occnrs only when F is linear. 

If F is not linear, we always have 

F* is an increasing homeomorphism between {Fq,F^) and (—F(0), affFoo) (2.20) 

with the obvions extensions to the bonndaries of the intervals when Fq or affFoo are hnite. 
By introdncing the closed convex snbset ^ of via 

^:= {((/), V') e : V' < = {(0,V') G + V' < F{s) Vs > O}, (2.21) 

the fnnction F can be recovered from F* and from throngh the dnal Fenchel-Morean 
formula 

F(s) = sup (s0 — F*((/))) = sup s(j) + ^jJ. (2.22) 

0eiR 

Notice that (5^ satishes the obvious monotonicity property 

(0,V’)e5', '0<'0, (^<0 ^ (0,'0)e5^. (2.23) 


If F is hnite in a neighborhood of +cxo, then F* is superlinear as (j)'[ oo. More precisely, 
its asymptotic behavior as 0 —)■ ±cxo is related to the proper domain of F by 

4= lim FM. (2,24) 

(f)^±00 (p 

The functions F and F* are also related to the sub differential OF : M ^ 2® by 

0 G dF{s) s G Dom(F), 0 G Dom(F*), F{s) + F*(0) = s0. (2.25) 

Example 2.5 (Power-like entropies). An important class of entropy functions is provided 
by the power like functions Up : [0, oo) —?■ [0, oo] with p G M characterized by 

17pGC~(0,oo), Up{l) = u;{l) = 0, U;{s) = sP-^, Up{0) = limUp{s). (2.26) 

^ ^ o I n 


Equivalently, we have the explicit formulas 


p^(sP-p(s-l)-l) if 0,1, 

Up{s) ={ slogs — s + 1 ifp=l, for s > 0, (2.27) 

s — 1 — log s if p = 0, 

with Up{0) = 1/p if p > 0 and Fp(0) = -|-cx) if p < 0. 

Using the dual exponent <? = p/(p — 1), the corresponding Legendre conjugates read 


Q 


U:{ci>) := 


g-1 
q 

q-l 


(1 + ^)+ - / ■ = 


e* - 1, Dom(t;‘) = 


if p > 1, g > 1, 
if p = 1, q = oo, 


(l H- -Y - 1 , Dom(f/!) = (- 00 ,1 - q), if 0 < p < 1, g < 0, 

\ g _ f/ 


log(l - 0), Dom([/!) = (- 00 ,1), 


if p = 0, g = 0, 


Q 


(l + -—- 1 , Dom(f//) = (- 00 ,1 - g], if p < 0, 0 < g < 1. 
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Reverse entropies. 

[ 0 , oo] as 


Let us now introduce the reverse density function R : [0, cxo) —)■ 


R{r) 


rF{l/r) if r > 0, 


(2.28) 


It is not difficult to check that i? is a proper, convex and lower semicontinuous function, 
with 


R(0) = F^, R'^ = F(0), affFoo = -R[„ affRoo =(2.29) 

so that R G r(M+) and the map F i-G i? is an involution on r(M+). A further remarkable 
involution property is enjoyed by the dual convex set Dl := {('^, 0) G : ^*(0) + 0 < 0} 
dehned as fl 2 . 21 l) : it is easy to check that 

(0,0) gS^ (0,0)e^. (2.30) 

It follows that the Legendre transform of R and F are related by 

ip<—F*{(f)) 0<—F*(0) (0,0)Gj^ for every 0,0 G M. (2.31) 

As in fl 2 . 20 l) we have 

R* is an increasing homeomorphism between (—affFoo, F(0)) and {—F^, —Fq). (2.32) 

A last useful identity involves the subdifferentials of F and R: for every s, r > 0 with 
sr = 1, and 0,0 G M we have 

^0 G dF{r) and 0 = — F*(0)j ^0 G dR{s) and 0 = —R*{'i/j)^. (2.33) 

It is not difficult to check that the reverse entropy associated to Up is 


2.4 Relative entropy integral functionals 

For F G r(M+) we consider the functional ^ : M(X) x M(X) —)■ [0, cxo] dehned by 

^(blh) := / F{a)dfx +F^-f^{X), j = afx + -f^, 7^ A p, (2.34) 

Jx d/i 

where 7 = a/i + y-*- is the Lebesgue decomposition of 7 w.r.t. p., see fl2.7lh Notice that 

if F is superlinear then ^(yjp) = +00 if 7 ^ p, (2.35) 

and, whenever rjo is the null measure, we have 

^h\Vo) = (2.36) 

where, as usual in measure theory, we adopted the convention 0 • cx) = 0 . 
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Because of our applications in Section [HI our next lemma deals with Borel functions 
if G B(X; M) taking values in the extended real line M := M U {±cxd}. By ^ we denote the 
closure of in M x M, i.e. 


{ ■^ < —F*(0) if — oo < 0 < 0 < +00 

-0 = -oo if 0 = = + 00 , (2.37) 

0 e [—oo, F(0)] if 0 = —oo, 

and, symmetrically by fl2.29p and fl2.30p . 

{ 0 < —F*(0) if — oo < 0 < F(0), 0 < +00 

0 = —oo if 0 = F(0) = + 00 , (2.38) 

0e[-oo,F^] if0 = -oo. 

In particular, we have 

(0,0) e § ^ ( 0 < F^ and 0 < F(0) ). (2.39) 


We continue to use the notation 0_ and 0+ to denote the negative and the positive part 
of a function 0, where 0-(x) := min{0(a:),O} and 0+(x) := max{0(x),O}. 

Lemma 2.6. // 7 ,/i G M(X) and (0,0) G B(X;^) satisfy 

^(yl/i) < 00 , 0_ G L^(X,/i) (resp. 0_ G 


then 0+ G L^(X, 7 ) (resp. 0+ G L^{X,p,)) and 

.^(7|/x) — / 0d/x> / 0d7. (2.40) 

ijv Jx 

Whenever 0 G L^(X, p) or f) E L^(X, 7 ), equality holds in 02.40^ if and only if for the 
Lebesgue decomposition given by Lemma \2.S[ one has 

0 G dF(a), 0 = —F*{(f)) (/i+ 7 )-a.e. in A, (2-41) 

0 = F(0) < 00 fi^-a.e. in 0 = F^, < 00 7 '*‘-a.e. in A^. (2.42) 

Equation fl2.4ip can equivalently be formulated as fj E dR{g) and 0 = 


Proof. Let us first show that in both cases the two integrals of fl2.40p are well defined 
(possibly taking the value — 00 ). If 0_ G L^(X, p) (in particular 0 > —00 /i-a.e.) with 
(0,0) G ^ we use the pointwise bound s0 < F(s) — 0 that yields s0+ < (F(s) — 0)+ < 
F(s) + 0_ obtaining 0+ G L^(X, 7 ), since (0,0) G § yields 0+ < F0,. 

If 0_ G L^(X, 7 ) (and thus 0 > —00 7 -a.e.) the analogous inequality 0+ < F(s) + s0_ 
yields 0+ G W{X,jL). Then, fl2.40p follows from fl2.2ip and fl2.39l) . 

Once 0 G L0X,/i) (or 0 G L0X, 7 )), estimate fl2.4Up can be written as 

(F(a) - cr0 - 0 ) dp + ^ (^F(O) - 0^ dp^ + i^L - 4>) d7^ > 0, 
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and by fl2.2ip and fl2.39p the equality case immediately yields that each of the three 
integrals of the previous formula vanishes. Since (0, 'ijj) lies in ^ C (/i + 7 )-a.e. in A,the 
vanishing of the hrst integrand yields ijj = —F*{a) and 0 G dF{a) by fl2.25p for p and 
(u + 7 ) almost every point in A. The equivalence fl2.33D provides the reversed identities 
V' e dR{g), (t) = -R*{ij). 

The relations in fl2.42p follow easily by the vanishing of the last two integrals and the 
fact that ^|J is hnite p-a.e. and 0 is hnite 7 -a.e. □ 

The next theorem gives a characterization of the relative entropy which is the 
main result of this section. Its proof is a careful adaptation of [21 Lemma 9.4.4] to the 
present more general setting, which includes the sublinear case when F^ < 00 and the 
lack of complete regularity of the space. This suggests to deal with lower semicontinuous 
functions instead of continuous ones. We denote by LSCs(X) the class of lower semicon¬ 
tinuous and simple functions (i.e. taking a hnite number of real values only) and introduce 
the notation ip = —cj) and the concave function 

:= -F’(-<p). (2.43) 

Theorem 2.7 (Duality and lower semicontinuity). For every 7 ,/i G M(X) we have 


= sup 


= sup 

= sup 


'0d/i-|- / 0d7 : 0,'0 e LSCs(X), (0(a:), V'(a;)) G ^ Vx G X 


X 


'X 


X 


^l^dp- J R*{ij)dj : ^l^,R*{^|J) eLSCsiX)^ 


F°((p)d/i— / (p dy : (p, F°(99) G LSC. 


X 


lx 


(2.44) 

(2.45) 

(2.46) 


and the space LSCs(X) in the supremum of fl2.44p . 02.451) and 02.46P can also be replaced 
by the space LSC 6 (X) (resp Bh{X)) of bounded l.s.c. (resp. Borel) functions. 

Remark 2.8. If (X, r) is completely regular (recall 02.3p L then we can equivalently 
replace lower semicontinuous functions by continuous ones in 02.441) . 02.451) and 02.46p L 
E.g. in the case of 02.441) we have 


^( 7 |/i) = sup I y V'dp + y 0 d 7 : (0,-0) e C 6 (X;(^)|. (2.47) 

In fact, considering hrst 02.44p . by complete regularity it is possible to express every 
couple (f, ^|J of bounded lower semicontinuous functions with values in ^ as the supremum 
of a directed family of continuous and bounded functions (</>«,' 0 a)aeA which still satisfy 
the constraint 'S due to 02.231) . We can then apply the continuity 02.2p of the integrals 
with respect to the Radon measures /i and 7 . 

In order to replace l.s.c. functions with continuous ones in 02.45P we can approximate 
-0 by an increasing directed family of continuous functions {'ipa)aeA- By truncation, one 
can always assume that max^^ > sup-^^ > inf> min-^. Since R*{yf) is bounded, 
it is easy to check that also R*{'ipa) is bounded and it is an increasing directed family 
converging to R*{'ijj). An analogous argument works for 02.471) . □ 
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Proof. Let us prove fl2.44p : denoting by its right-hand side, Lemma YTM yields ^ . 

In order to prove the opposite inequality let B G 23 (X) a /i-negligible Borel set where 
is concentrated, let A := X \ i? and let a : X ^ [0, cxd) be a Borel density for 7 w.r.t. fi. 
We consider a countable subset (0„, with 'ipi = cfi = 0, which is dense in ^ and an 

increasing sequence G (—oo,F^) converging to with f)n := —F*{cl)n). By 02.221) 
we have 


F{a{x)) = lim Fjv(x), where for every x E X Fn{x) := sup 'ifn + o'{x)(f)n ■ 

Ntoo l<n<N 

Hence, Beppo Levi’s monotone convergence theorem (notice that Fn > Fi = 0 ) implies 
^(yl/i) = limAr-^oo^Af(7lAi), where 


^Nh\p)-= / FN{x)dn{x) + (j)Nl{B). 
J A 

It is therefore sufficient to prove that 

^ every iV G N. 


(2.48) 


We £x G N, set 0o := fjo := t/’w, and recursively define the Borel sets Hj, for 
j = 0,..., iV, with Aq := B and 


Ai\= {x E A\ Fi{x) = Fm{x)}, 

Aj := {x E A : F^{x) = Fj{x) > Fj_i{x)} for j = 2,..., N. 


(2.49) 


Since Fi < F 2 < ... < F^, the sets At form a Borel partition of A. As n and 7 are Radon 
measures, for every e > 0 we find disjoint compact sets Kj C Aj and disjoint open sets 
(by the Hausdorff separation property of X) Gj D Kj such that 

N N N 

E \ \ = A* (^ \ U (-^ \ U ^ 


j=0 


j=0 


j=0 


where 


Sn := [{(frr - + (V’n “ V’min)] , 0min — ^ 

0<72<A' 0<^<A' 0<j<N 

Since (</>min 5 V’min) ^ ^ and the sets Gn are disjoint, the lower semicontinuous functions 


N 


N 


xPn{x) := + 5^(V’n - V’min)AGja:), (t)N{x) := + 5^(0n - Km)'>(G„{x) 

(2.50) 


n=0 


n=0 


20 






take values in ^ and satisfy 


N 


^v(7l/^) = + 0o7(klo) 

J=l 

N „ 

= 0min7W+V'min/^(^) + 5^( / (0i “ 0min) ^ 7 ( 2 ;) + / (V'j “ V'min) d/i(a:) 

V „ 

< </>min7(^) + V’min/^(^) + ^7(3^) + / (V’i “ Cin) d/i(x) ) + £ 


J=0 


< / 0Ar(a:)d7(a:) + / dyn(x) + £. 

7x 7x 




Since e is arbitrary we obtain fl2.48p . 

Equation fl2.45p follows directly by fl2.44p and the previous Lemma 12.61 In fact, de¬ 
noting by the righthand side of fl2.45p . Lemma [2.61 shows that ^"(yl/r) < ^(yl/i) = 
^'(yl/i). On the other hand, if 0,"^ G LSC 5 (X) with (0,-^) G ^ then > 0. 

Hence, R*{'ip) G LSCs(X) since R* is nondecreasing, does not take the value —oo, and is 
bounded from above by —0. We thus get ^"(yl/i) > ^'(yl/i). 

In order to show fl2.46p we observe that for every G LSCs(X) with R*{i>) G LSCs(X) 
we can set (p := R*{4>) G LSCs(X); since (-0, —R*{'ip)) G ^ fl2.3ip yields ^|J < —F*{—ip) = 
F°{(p) so that / F°{(p)dfi — J (pd^ > f -i/jdp — f R*('i/j)dp'. Since F° cannot take the 
value -l-cxo, we also have that (—</?, F°{ip)) G so that J F°{ip) dju — f (pd'y < by 

Lemma 12.61 

When one replaces LSCs(X) with LSCb(X) or Bb(X) in fl2.44p . the supremum is taken 
on a larger set, so that the righthand side of fl2.44p cannot decrease; on the other hand. 
Lemma 1X61 shows that ^('yj/i) still provides an upper bound even if are in Bb(X), 
thus duality also holds in this case. The same argument applies to 02.451) or 02.46p . □ 


The following result provides lower semicontinuity of the relative entropy or of an 
increasing sequence of relative entropies. 


Corollary 2.9. The functional ^ is jointly convex and lower semicontinuous in M(X) x 
M(X). More generally, if Fn G r(M+), n G N, zs an increasing seguence pointwise 
converging to F and (/i, 7 ) G M(X) x M(X) is the narrow limit of a seguence {pniln) G 
M(X) X M(X), then the corresponding entropy functionals satisfy 


dminf ^nilnlPn) > (2.51) 

Proof. The lower semicontinuity of follows by 02.441) . which provides a representation 
of ^ as the supremum of a family of lower semicontinuous functionals for the narrow 
topology. Using Fn > Fm ior n >m hxed, we have 


limini ^nidnlh^n) > liminf .Fm( 7 „|/x„) > 

n^oo n^oo 

by the above lower semicontinuity. Hence, it suffices to check that 


lim every 7 ,/i G M(7f). (2.52) 

n—>-cx) 
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This formula follows easily by the monotonicity of the convex sets (associated to Fn 
by fl2.2ip l C 5n+i and by the fact that 5 = ^ndn, since F* is pointwise decreasing 
to F*. Thus for every couple of simple and lower semicontinuous functions (0, "0) taking 
values in we have ('0(2^), 4>ix)) G for every x E X and a sufficiently large N so that 


liminf ^n( 7 l/^) > 

n—)-oo 


-0 dp + 


' X 



Since 0,-0 are arbitrary we conclude applying the duality formula 02.441) . 


□ 


Next, we provide a compactness result for the sublevels of the relative entropy, which 
will be useful in Section 123] (see Theorem 13.31 and Lemma [3.9p . 

Proposition 2.10 (Boundedness and tightness). If % G M(X) is hounded and F^ > 0, 
then for every C > 0 the sublevels of IX 

Sc := jb £ M(X) : .^(ylp) < C for some fv G 3c|, (2.53) 

are bounded. If moreover X is equally tight and F'^ = oo, then the sets Sc are equally 
tight. 

Proof. Concerning the properties of Sc, we will use the inequality 


< ^( 7 |p) + F*{X)n{B) for every A G (0, F^), and B G B(X). (2.54) 


This follows easily by integrating the Young inequality Xa < F{a) + F*{X) for A > 0 and 
the decomposition 7 = ap + 7 “*“ in 5 with respect to p and by observing that 


X'j{B) = X / crdp + A 7 -‘-(i?) < A / crdp + F'^'y^{B) if 0 < A < F'^. 

J B J B 

Choosing first B = X in 02.541) and an arbitrary A in (0, F^) (notice that F*(A) < 00 
thanks to 02.18p ) we immediately get a uniform bound of 7 (Y) for every 7 G Sc. 

In order to prove the tightness when F'^ = 00 , whenever e > 0 is given, we can choose 
A = 2C/e and p > 0 so small that pF*(A)/A < e/2, and then a compact set iC C Y 
such that p(Y \ K) < rj for every p G 3C. 02.54p shows that 7 (Y \ K) < e for every 
7 G S. □ 


We conclude this section with a useful representation of X in terms of the reverse 
entropy R 02.28p and the corresponding functional We will use the result in Section 1331 
for the reverse formulation of the primal entropy-transport problem. 

Lemma 2.11. For every 7 ,p G M(X) we have 


'^(/^Ia) = / R{g{x)) d'y{x) + Roc p^{X), 


(2.55) 


where p = P 7 -|- p“*“ is the reverse Lebesgue decomposition given by 02.81) . In particular 

.^(aIp) = e^(p|7). (2.56) 

Proof. It is an immediate consequence of the dual characterization in 02.44p and the 
equivalence in O2.30p . □ 
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3 Optimal Entropy-Transport problems 

The major object of Part I is the entropy-transport fnnctional, where two measnres pi G 
M(Xi) and ^2 ^ M(X 2 ) are given, and one has to hnd a transport plan 7 G M(Xi x X 2 ) 
that minimizes the fnnctional. 


3.1 The basic setting 

Let ns hx the basic set of data for Entropy-Transport problems. We are given 

- two Hansdorff topological spaces (X*, Ti), i = 1, 2, which dehne the Cartesian prod- 
net X := Xi X X 2 and the canonical projections vr* ; X —)■ Xf, 

- two entropy fnnetions Fi G r(M+), thns satisfying (12.13^ : 

- a proper lower semicontinnons cost fnnetion c ; X —)■ [0, -|-cxo]; 

- a conple of nonnegative Radon measnres /ij G M(Xj) with hnite mass := Pi(Xj) 
satisfying the compatibility condition 

J := ^miDom(Fi)j fl ^m2Dom(F2)j 7^ 0- (3.1) 


We will often assnme that the above basic setting is also coercive: this means that at least 
one of the following two coercivity conditions holds: 


Fl and F 2 are snperlinear, i.e. {Fi)'^ = -|-oo; 

(Fi)'^ + (F 2 )'^ + infc > 0 and c has compact snblevels. 


(3.2a) 

(3.2b) 


For every transport plan 7 G M(X) we dehne the marginals 7^ := 7rj7 and, as in fl 2 . 34 p . 
we dehne the relative entropies 


^i(7lhi) 



dpi + (Fi)'^7,^(Xi), 7i = 7rj7 


F 7i ) 


With this, we introdnee the Entropy- Transport functional as 


d7i 

d/ii' 

(3.3) 


^(7lhi,h2) ;= X]^i(7lhi) + 

i 


c(ii,i2)d7(2'i.a:2), 


(3.4) 


possibly taking the valne -|-cx). Onr basic setting is feasible if the fnnctional S' is not 
identically -|-oo, i.e. there exists at least one plan 7 with S{'j\yLi, ^, 2 ) < 00 . 


3.2 The primal formulation of the Optimal Entropy-Transport 
problem 

In the basic setting described in the previons Section 13.11 we want to investigate the 
following problem. 
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Problem 3.1 (Entropy-Transport minimization). Given /i* G M(Xj) find 7 G M(JC) = 
M(Xi X X 2 ) minimizing S’ 112 ), i-e. 

^(7l/^i,/^2) = Br(/ii,/X 2 ) := mf (f(<T|/ii,/i 2 ). (3.5) 

<TeM(x) 

ITe denote by Opt^(/ii,/i 2 ) C M(X) the collection of all the minimizers of fl3.5p . 

Remark 3.2 (Feasibility conditions). Problem 13.11 is feasible if there exists at least one 
plan 7 with ^ 12 ) < 00 . Notice that this is always the case when 

Fj(0) < 00 , i = l,2, (3.6) 

since among the competitors one can choose the nnll plan ry, so that 

ET(/ii, 1 x 2 ) < 1 ^ 2 ) = T’i(0)pi(X) -|- T2(0)/i2(^). (3-7) 

More generally, thanks to fl3.ll) a snfficient condition for feasibility in the nondegenerate 
case mim 2 ^ 0 is that there exit fnnctions Bi and B 2 with 

c{xi, X 2 ) < Bfixi) + ^ 2 ( 2 : 2 ), Bi G L^(Xi, Hi). 

In fact, the plans 

e 


1 = 


mim2 


-pi 0 H 2 with 6 E J given by (13.1 p 


(3.8) 

(3.9) 


are Radon m Thm. 17, p. 63], have hnite cost and provide the estimate 
ET(/ii,^ 2 ) < miFi{ 0 /mi) + m2F2iO/m2) + for every 0 e J. 

(3.10) 

Notice that (BU is also necessary for feasibility: in fact, setting := mi + 
the convexity of Fj, the dehnition fl2.15p of (Fi)'^, and Jensen’s inequality provide 


^i(7lhi) = / Ffiai) dfii + lim. / Ffin) d{n ^ 7 /-) > lim mi^nFi('ji{Xi)/mi^n) 

I njoo ' 


'Xi 


'Xi 


(3.11) 

(3.12) 


> miFfim/mi), where m := 'jfiXi) = 'y{X). 

Thus, whenever H2) < 00, we have 

^(7|/ii,/i2) > minf c + miFi(m/mi) -h m2F2{m/m2), 

and therefore 

m = 7(X) G (mi Dom(Fi)) n (m2Dom(F2)) = J- ( 3 . 13 ) 

We will often strengthen ( 13 .ip by assuming that at least one of the domains of the entropies 
Fi has nonempty interior, containing a point of the other domain: 

int (miDom(Fi)) nm2Dom(F2)j U ^miDom(Fi) flint (m2Dom(F2))j 7^ 0 - ( 3 . 14 ) 

This condition is surely satished if J has nonempty interior, i.e. ma.x{mis~[,m2sf) < 
mm{misf, m2S2), where sf = inf Dom(Fj), sf := supDom(Fj). □ 

We also observe that whenever /ii(W) = 0 then the null plan 7 = 'Jyo provides the 
trivial solution to Problem 13.11 Another trivial case occurs when Fj( 0 ) < cx) and Fi are 
nondecreasing in Dom(Fj) (in particular when FfiO) = 0 ). Then it is clear that the null 
plan is a minimizer and ET(/ii,/i2) = -^i( 0 )mi J- F 2 ( 0 )m 2 . 
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3.3 Examples 

Let us consider a few particular cases: 

E.l Costless transport: Consider the case c = 0. Since Fi are convex, in this case 
the minimum is attained when the marginals 7 ^ have constant densities. Setting 
ai = 9/rrii in order to have rriiai = m 2 (T 2 , we thus have 

ET(pi,/i 2 ) = Ho{mi,m 2 ) := min |miFi(6'/mi) + m 2 F 2 { 6 /m 2 ) : 6^ > o|. (3.15) 


E.2 Entropy-potential problems: If ^2 = 7 o then setting V{xi) := inf 3;2 6 V 2 c(xi, 0 : 2 ) 
we easily get 

ET(p,0)= inf ^( 7 |/i)+/ Ed7 + (F 2 )'^ 7 ra- (3-16) 

7eM(Xi) 


E.3 


Pure transport problems: We choose Fi{r) = Ii(r) = < 

l+cxo otherwise. 

In this case any feasible plan 7 should have pi and p 2 as marginals and the functional 
just reduces to the pure transport part 


T(/ii,/i 2 ) = min / cd 7 : 7 rj 7 = p* L (3.17) 

JXlXX2 ^ 

As a necessary condition for feasibility we get /ii(Xi) = ^ 2 (^ 2 )■ 

A situation equivalent to the optimal transport case occurs when fl3.14p does not 
hold. In this case, the set J dehned by fl3.ll) contains only one point 6 which separates 
miDom(Fi) and m 2 Dom(E 2 ): 

6 = misf = 171282 or 9 = rriiSi = 171282 ■ (3.18) 

It is not difficult to check that in this case 

ET(/ii,/i2) = miFi(6'/mi) + m2F2{9/m2) + T(/ii,/i2). (3.19) 


E.4 Optimal transport with density constraints: 

by introducing characteristic functions of intervals 
< 1 < bi. E.g. when a* = 1, 5^ = cxo we have 

ET(/ii,p 2 ) = min / cd 7 : 

JXlXX2 

For [oi, 61 ] = [0,1] and [ 02 , 62 ] = [1, + 00 ] we get 

ET(/ii,/i 2 ) = min i / cd 7 : 7 ru 7 
JxixX2 

whose feasibility requires ^ 12 (^ 2 ) > 


We realize density constraints 
[ai,bi\, viz. Fi{r) := 

7rj7>/^i}- (3.20) 

</^i, vrj 7 >/is}, (3.21) 
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E.5 Pure entropy problems: These problems arise if Xi = X 2 

0 ii Xi = X2 


forbidden, i.e. = + 00 , c(a:i,X 2 ) = 


+CX0 otherwise. 


X and transport is 


In this case the marginals of 7 coincide: we denote them by 7 . We can write the 
density of 7 w.r.t. any measure fi such that Hi n (say, e.g., /i = yUi + ^ 2 ) as 7 = ■d/i 
and then /ij = -dj/i. Since 7 <C we have 'd(a;) = 0 for /i-a.e. x where 'di{x)'d 2 {x) = 0. 
Thus ai = 'd/'di is well dehned and we have 


<^( 7 lhl,h 2 ) = J +'& 2 F 2 {'&/'& 2 ) ) d/i. 


(3.22) 


with the convention that 'diFi{d/di) = 0 if "d = "dj = 0. Since we expressed everything 
in terms of /i, by recalling the dehnition of the function FIq given in fl3.15p we get 


ET(/ii,yU,2) = J whenever /i, </r. 


(3.23) 


In the Hellinger case Fi{s) = f/i(s) = slogs — s + 1 a simple calculation yields 

Ho{0i, 62) = 61 + 62 — 2\/6162 = ~ • 

In the Jensen-Shannon case, where T)(s) = Uq{s) = s — 1 — logs, we obtain 


(3.24) 


Ho{9u92) = 9, log 


2di 


9i + 02 


+ 02 log 


202 


01 + 02 


Two other interesting examples are provided by the quadratic case Ti(s) = |(s — 1)" 
and by the nonsmooth “piecewise affine” case Fi{s) = |s — 1 |, for which we obtain 

1 


^o(0i,02) = 


2(01 + 02 


-(01 — 02)^, and Ho{ 9 i, 92 ) = | 0 i — 02|, respectively. 


E .6 Regular entropy-transport problems: These problems correspond to the choice 
of a couple of differentiable entropies T) with Dom(Fj) D (0, 00 ), as in the case of 
the power-like entropies Up dehned in fl2.26p . When they vanish (and thus have a 
minimum) at s = 1, the Entropic Optimal Transportation can be considered as a 
smooth relaxation of the Optimal Transport case E.3. 

E.7 Squared Hellinger-Kantorovich distances: For a metric space (X, d), set Xi = 

X 2 = X and let r be induced by d. Further, set Fi{s) = ^2(5) := Ui{s) = s log s—s-|-l 
and 

c{xi,X2) := — log ^ cos^ (d(a:i, X2) A 7 r/ 2 ) j or simply c(a;i, X2) := d^(a;i, X2). 

These cases will be thoroughly studied in the second part of the present paper, see 
Section [ 6 l 
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E .8 Marginal Entropy-Transport problems: In this case one of the two marginals 
of 7 is hxed, say 71 , by choosing -Fi(r) := Ii(r). Thus the functional minimizes the 
sum of the transport cost and the relative entropy of the second marginal ^ 2 ( 721 /^ 2 ) 
with respect to a reference measure /i 2 , namely 

{^2(7|/^2) + T(7,/ii)|. 


This is the typical situation one has to solve at each iteration step of the Minimizing 
Movement scheme [2], when T is a (power of a) transport distance induced by c, as 
in the Jordan-Kinderlehrer-Otto approach [2T] . 

E.9 The Piccoli-Rossi “generalized Wasserstein distance” | |35|, 136] : for a metric 
space {X, d), set Xi = X 2 = X, let r be induced by d, and consider Fi{s) = ^ 2 ( 5 ) : = 
l/(s) = |s — 1 | with c(xi,a; 2 ) := d{xi,X 2 )- 

E.IO The discrete case. Let yUi = 1^2 = with ai, jSj > 0, and let 

Cjj := c{xi,yj). The Entropy-Transport problem for this discrete model consists in 
Ending coefficients 7 *^ > 0 which minimize 


i 


V Oj / 


j 


[YliA] 
\ /3, J 



(3.25) 


3.4 Existence of solutions to the primal problem 

The next result provides a first general existence result for Problem 13.11 in the basic 
coercive setting of Section 13.11 

Theorem 3.3 (Existence of minimizers). Let us assume that Problem \S.l\ is feasible (see 
Rem^ark WM and coercive, i.e. at least one of the following conditions hold: 

(i) the entropy functions Fi and F 2 are superlinear, i.e. (Ti)(^ = (T 2 )(^ = -|-cxo; 

(a) c has compact sublevels in X and (Ti)(^ -|- {F 2 y^ -|- inf c > 0. 

Then Problem, \S.1\ admits at least one optimal solution. In this case Opt^(/ii,/i 2 ) is a 
compact convex set o/M(X). 

Proof. We can apply the Direct Method of Calculus of Variations: since the map 7 1 —)■ 
<#’( 7 |/ii, ^, 2 ) is lower semicontinuous in M(Xi x X 2 ) by Theorem 12.71 it is sufficient to show 
that its sublevels are relatively compact, thus bounded and equally tight by Prokhorov 
Theorem 12.21 In both cases boundedness follows by the coercivity assumptions and the 
estimate fl3.12p : 

in fact, by the definition 02.151) of {Fif^ we can End s > 0 such that ^Fi{(^) > 
whenever m > smp, if a := inf c -|- (-^*)oo > 0 hie estimate 03.12p yields 

7 (X) < -S’{"flHi, 10 , 2 ) for every 7 G M(X) with 7 (X) > s max(yUi(Vi),/i 2 (V 2 )). 
a 
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In case (ii) equal tightness is a consequence of the Markov inequality and the nonnegativity 
of Fi\ in fact, considering the compact sublevels K\ := {(xi,X2) E Xi x X2 : c(xi,X2) < 
A}, we have 

7(X \Kx) < J ^^7 < H2) for every A > 0. 

In the case (i), since c > 0 Proposition 12.101 shows that both the marginals of plans in a 
sublevel of the energy are equally tight: we thus conclude by [21 Lemma 5 . 2 . 2 ], □ 

Remark 3.4. The assumptions (i) and (ii) in the previous Theorem are almost optimal, 
and it is possible to find counterexamples when they are not satished. In the case when 
0 < (Ti)'oo + (-^2)^ < ^ t)ut c does not have compact sublevels, one can just take 
Fi{s) := Uo{s) = s - logs - 1, Xi := M, c(a:i,a:2) := 3 e"^?"^ 2 , p* = (fo- 

Any competitor is of the form 7 := a 6 o (8) (fo + ® <^0 + < 5 o (811^2 with z/, G M(R) and 

z/j({ 0 }) = 0 . Setting n* := z/i(R) we hnd 

^(dl/^i, ^2) = T(a + ? 7 -i) + T(q; + 72,2) + 3 + J e ^ d(z/i + z/2)^+72-1+77.2. 

Since minsF(s) + s = log 2 is attained at s = 1/2, we immediately see that 

^(7|pi,/i2) > 21og2 + a + 3 J ^ d(z/i + U2) > 2 log2. 

Moreover, 2 log 2 is the infimum, which is reached by choosing a = 0 and ui = 1x2 = ^ 6 x, 
and letting x 00. On the other hand, since 77,1 + 772 + a > 0 , the inhmum can never be 
attained. 

In the case when c has compact sublevels but (Ti)'oo = (-^2)^ = mine = 0 , it is 
sufficient to take Fi{s) := s“^, Xi = [— 1 , 1 ], c(a;i,a;2) = xf + and /i^ = ho- Taking 
7n := n 6 o 0 ho one easily checks that inf ^(7|/ii,/72) = 0 but ^(7|/7i,/72) > 0 for every 
7 e M(R2). □ 

Let us briefly discuss the question of uniqueness, the first result only addresses the 
marginals 7* = 77^7. 

Lemma 3.5 (Uniqueness of the marginals in the superlinear strictly convex case). Let us 
suppose that Fi are strictly convex functions. Then the p^i-absolutely continuous part (JiPi 
of the marginals 7* = 7rj7 of any optimal plan are uniquely determined. In particular, 
if Fi are also superlinear, then the marginals 7* are uniquely determined, i.e. if^',^" G 
Opt^{pi,P2) then 7 rj 7 ' = vrjY', 7 = 1 , 2 . 

Proof. It is sufficient to take 7 = which is still optimal in Opt^(/7i, P2) since S' is 

a convex functional w.r.t. 7. We have 7rj7 = 7* = \'l'i + \'l'l = |(Y +Y 0 A^+i( 7 i)'^ + |( 7 i 0 “^ 
and we observe that the minimality of 7 and the convexity of each addendum Fi in the 
functional yield 

f = 1, 2- 
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Since we obtain 

1^ (^Fi{ai) - ]^Fi{a[) - d/i^ = 0 i = 1, 2. 

Since Fi is strictly convex, the above identity implies ai = a[ = a" /ij-a.e. in X. □ 

The next corollary reduces the uniqueness question of optimal couplings in Opt^(/ii, /i2) 
to corresponding results for the Kantorovich problem associated to the cost c. 

Corollary 3.6. Let us suppose that F^ are superlinear strictly convex functions and that 
for every couple of probability measures z/* G IP(Xi) with Ui -C Hi the optimal transport 
problem associated to the cost c (see Example E .3 of Section ] 3 . 3 \) admits a unique solution. 
Then T2) contains at most one plan. 

Proof. We can assume = /ij(Xj) > 0 for z = 1 , 2 .It is clear that any 7 G Opt^(/ii, /i2) is 
a solution of the optimal transport problem for the cost c and given (possibly normalized) 
marginals 7*. Since 7* pa ^^^id 71 and 72 are unique by Lemma 13.51 we conclude. 

□ 


Example 3.7 (Uniqueness in Euclidean spaces). If Fi are superlinear strictly convex 
functions, c{x, y) = h{x — y) for a strictly convex function h : ^ [0, cxo) and pi <C =2’'^, 

then Problem 13.11 admits at most one solution. It is sufficient to apply the previous 
corollary in conjunction with [21 Theorem 6 . 2 . 4 ] 

Example 3.8 (Nonuniqueness of optimal couplings). Consider the logarithmic density 
functionals Fi{s) = Ui{s) = slogs — s + 1 , the Euclidean space Xi = X2 = and any 
cost c of the form c(a;i,a;2) = h{\xi—X2\). For the measures 

Hi = < 5 (_i,o) + <5(1,0), and H2 with support in {0} x M and containing at least two points, 

there is an inhnite number of optimal plans. In fact, we shall see that the first marginal 
7i of any optimal plan 7 will have full support in (—1,0), (1,0), i.e. it will of the form 
a( 5 (_i,o) + ^<5(1,0) with strictly positive a, b, and the support of the second marginal 72 will 
be concentrated in { 0 } x M and will contain at least two points. In fact, any plan cr with 
marginals 71,72 will then be optimal, since it can be written as the disintegration 

O' = / («(l/)<5(-i,o) + /5(2/)(5 (i,o)) d72(z/) 

with arbitrary nonnegative densities a, (3 with a + (3 = 1 and / a d72(z/) = a, J (3 d'j2{y) = 
b. In fact, the cost contribution of cr to the total energy is 


h{^/l + y‘^) dj2{y) 


and it is independent of the choice of a and ( 3 . □ 

We conclude this section by proving a simple lower semicontinuity property for the 
energy-transport functional ET. Note that in metrizable spaces any weakly convergent 
sequence of Radon measures is tight. 
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Lemma 3.9. LetL be a directed set, {F^)x£h and (c'^)AeL monotone nets of superlinear 
entropies and costs pointwise converging to Fi and c respectively, and let (/i^)AeL be egually 
tight nets of measures narrowly converging to /ij in Denoting by ET^ (resp. EY) 

the corresponding Entropy-Transport functionals induced by Ff and (resp. F^ and c) 
we have 

lim inf ET^(/i^,/i 2 ) > ET(yUi,/i 2 )- (3.26) 

AgIL* 

Proof. Let 7 ^ G Opt^(yU.^, C M(X) be a corresponding net of optimal plans. The 
statement follows if assuming that ( 7 ^|/ii, = ET(/i^,/i 2 ) < C < 00 we can prove 

that ET(/ii,p 2 ) < C. By applying Proposition 12.101 we obtain that the sequences of 
marginals vrj 7 ^ are tight in M(Xj), so that the net 7 ^ is also tight. By extracting a 
suitable subnet (not relabeled) narrowly converging to 7 in M(X), we can still apply 
Proposition 12.101 and the lower semicontinuity of the entropy part of the functional 
S’ to obtain liminfAgL^^(7^|hi)^ 2 ) — ^( 7 lhi)h 2 )- A completely analogous argument 
shows that liminfAgL / d 7 ^ P f cdj. □ 

As a simple application we prove the extremality of the class of Optimal Transport 
problems (see Example E.3 in Section [XS]) in the set of entropy-transport problems. 

Corollary 3.10. Let Fi,F 2 G r(M+) be satisfying Fi{r) > Fi{l) = 0 for every r G 
[ 0 , cxo), r 7 ^ 1 and let ET” be the Optimal Entropy Transport value fl3.5p associated to 

(riFi,77,^2). Then for every couple of egually tight seguences P2,n) F M(Xi)xM(X2), 

n G N, narrowly converging to (/ii,p 2 ) we have 

lim ET"(pi,„, p 2 ,n) = T(/ii, p 2 )- (3.27) 


3.5 The reverse formulation of the primal problem 

Let us introduce the reverse entropy functions Ri (see fl2.28p l via 


rFi{l/r) if r > 0, 
{Ff)'^ if r = 0, 


and let be the corresponding integral functionals as in 02.551) . 
Keeping the notation of Lemma 12.31 


li 


7rj7 G M(Xi), Hi = Qi-fi -E nt, 



we can thus dehne 


(3.28) 


(3.29) 


^(hi,h2|7):= + / cd7 


-Ri(o(a::i)) + R2{Q2{x 2)) + c{xi,X2)] d7 -E ^f.(o)kEv). 


(3.30) 


By Lemma 12.111 we easily get the reverse formulation of the optimal Entropy-Transport 
Problem 13.11 
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Theorem 3.11. For every 7 G M(X) and G M(Xj) 


(3.31) 


In particular 

ET(/ii,/i 2 )= inf p. 2 \l), (3.32) 

7eM(X) 

anc? 7 G Opt^(/ii,/i 2 ) if and only if it minimizes fi 2 \-) mM(X). 

The functional /i 2 |, •) is still a convex functional and it will be useful in Section^ 


4 The dual problem 

In this section we want to compute and study the dual problem and the corresponding 
optimality conditions for the Entropy-Transport Problem 13. II in the basic coercive setting 
of Section EU 


4.1 The “inf-sup” derivation of the dual problem in the basic 
coercive setting 

In order to write the hrst formulation of the dual problem we introduce the reverse entropy 
functions Ri dehned as in 02.281) or Section 13751 and their conjugate i?* : R —)■ (—cxo, -|-cxo] 
which can be expressed by 

Rii'ip) := sup {sfj - sFi{l/s)) = sup {fj - Fi{r))/r. (4.1) 

s>0 r>0 

The equivalences 02.311) yield, for all (0, fj) G R^ 

^ 4><-Rm- (4.2) 


As a hrst step we use the dual formulation of the entropy functionals given by Theorem 
12.71 (cf. 02.451) 1 and hnd 

(f(7|/ii,/i2) = y cd7-Fsup|^ i?-('0i)d7i):'0i,i?-('0*)^LSCs(A:i) 

It is natural to introduce the saddle function =Sf( 7 ,'j/’) depending on 7 G M(X) and 
-0 = (- 01 ,- 02 ) (we omit here the dependence on the hxed measures /Xj G M(A'j)) 

■^(7,V’):=y (c(a;i,a;2)-.R2(V'2(a;2))) d7-h^y V’id/i*. (4.3) 

In order to guarantee that takes real values, we consider the convex set 

M ;= {7 G M(X) ; j cd 7 < 00 }. (4.4) 

We thus have 

^( 7 l/^i,h 2 )= sup 

7i,R*(7i)6LSC7Xi) 
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and the Entropy-Transport Problem can be written as 

ET(/ii,/i 2 ) = inf sup ^( 7 ,V^). (4.5) 

76M Pi,i?*('i/.peLSC4Xi) 

We can then obtain the dual problem by interchanging the order of inf and sup as in 
Section [2]2j Let us denote by <pi © <p 2 the function {xi,X 2 ) ^i{xi) + ^P 2 {x 2 )- Since for 
every xj) = {^|Jl,^p 2 ) with e LSC^(Xi) 


inf^ / - Rli^l^iiyXi)) -i?;(V'2(a:2))) ^7 = < _ 


7 eM 
we obtain 


0 if ©i? 2 (V' 2 ) < c, 

oo otherwise, 


inf ^(7,-0) = 

7 GM 


I i’i d/ii if Rl{i>i) © R* 2 {i> 2 ) < c, 
otherwise. 


'Xi 


(4.6) 


-OO 


Thus, fl4.6p provides the dual formulation, that we will study in the next section. 


4.2 Dual problem and optimality conditions 

Problem 4.1 (i/?-formulation of the dual problem). Let R* he the convex functions defined 
by fl4.ip and let 4^ be the the convex set 

;= G LSa(Xi) X LSC,(X 2 ) : R*{fii) hounded, Rl{fii) © Rl{'fi 2 ) < c}. (4.7) 

The dual Entropy-Transport problem consists in finding a maximizer -j/? G 4^ for 

D(/ii,p 2 ) = sup / V'ld/ii© / ' 02 d/i 2 . (4.8) 

Jxi Jx2 

As usual, by operating the change of variable 

99, := -R*m, := -Ffi-cpi), (4.9) 

we can obtain an equivalent formulation of the dual functional D as the supremum of the 
concave functionals 

■=^ [ F°{ipi)dfii, (4.10) 

on the simpler convex set 

$ ;= I 99 G LSa(A:i) X LSa(A: 2 ), Ff{g^i) bounded, © 9^2 < c}. (4.11) 

Problem 4.2 (( 99 -formulation of the dual problem). Let F° be the concave functions 
defined by 04.91) and let 4» be the the convex set 04.111) . The (p-formulation of the dual 
Entropy-Transport problem consists in finding a maximizer (99 G $ for 

D'(/il,/i 2 ) = sup ^(C99|pi,/i2) = sup V / F°{Lpi)(lpi. (4.12) 

■ J Xi 
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Proposition 4.3 (Equivalence of the dual formulations). The ijj- and the cj)- formulations 
of the dual problem are equivalent, D(yU,i,/i 2 ) = D'(/ii,/i 2 ). 

Proof. Since R* is nondecreasing, for every -0 G the functions tpi := R*{'ipi) belong 
to LSCs(Xj) and satisfy © (^2 < c, with G It then follows that 'ipi : = 

—F*{—(pi) = > 'ipi are bounded, so that {(^ 1 ,(^ 2 ) ^ ^ and D' > D. An analogous 

argument shows the converse inequality. □ 

Since “inf sup > sup inf” (cf. 02 . 101 ) ). our derivation via 04. 5p yields 

ET(/ii,/i2) > D(/^i,Ai2)- (4.13) 

Using Theorem 12.41 we will show in Section 14731 that 04.13p is in fact an equality. Before 
this, we hrst discuss for which class of functions 'ipiyifi the dual formulations are still 
meaningful. Moreover, we analyze the optimality conditions associated to the equality 
case in 04.131) . 

Extension to Borel functions. It is intended that in some cases we will also consider 
larger classes of potentials t/? or by allowing Borel functions with extended real values 
under suitable summability conditions. 

First of all, recalling 02.19p and 02.29p . we extend R* and F° to R by setting 

R*{-oo) := -F^, R*{+oo) := + 00 ; F°(-oo) := -cx), F°(+oo) := F(0), (4.14) 

and we observe that with the definition above and according to 02.37^ - 02.38p the couples 

{—ip, F°{(p)) and {—R*{'ijj),'i/j) belong to § whenever ip < F(0) and p > —(4-15) 

We also set 

Cl +0 C 2 := lim (—n V Ci A n) + {—n V C 2 A n) for every Ci, C 2 ^ R. (4.16) 

Notice that (±cxo) +0 (±cxo) = ±cxo and in the ambiguous case +cxo — 00 this definition 
yields (+cxo) +„ (— cxo) = 0. We correspondingly extend the definition of © by setting 

(Cl ®o C 2 )(aii,X 2 ) := Ci(aii )+0 ( 2 ( 212 ) for every CieB(W;R). (4.17) 

The following result is the natural extension of Lemma ITHI stating that ^ 2 ) > 

112 ) for a larger class of 7 and p as before. 

Proposition 4.4 (Dual lower bound for extended real valued potentials). Let 'y he a 
feasible plan and let cp G B(Xi;R) x B(X 2 ;R) with pi > —{Fi)'^, pi ®o P2 F c with 
{F° o p-)_ G \F{Xi,pii) (resp. {pi)+ G L^(W, 7 i)j. 

Then we have (</?*)_ G (resp. {F° o p^)_^_ g L^{Xi,fii)) and 

[ F°{pi)dpi. (4.18) 
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Remark 4.5. In a similar way, if V’ G B(Xi,M) x B(X 2 ,M) with ipi < -Fi(O), ©o 

< c, and (V'i)- G V{Xi,jj,i) (resp. {R* o G L^(Xi,7i)), then (R- o G 
L^(Xi,7i) (resp. (V'i)+ G L^(Xi,pi)) with 




(4.19) 


Proof. Let us consider fl4.18l) in the case that (F° o ipi)_ G L^(Xj,/ij) (the calculations in 
the other cases, including fl4.19p . are completely analogous). Applying Lemma [2.61 fwith 
-04 := F° o ipi and (fi := —(pi) and 02.391) we obtain (</?*)_ G L^(Xj,7j) and then 

(f(7|/ii,/i2) = ^^i(7ilAii) + y cd7 > ^^i(7i|/ij) + y [pi{xi)+oP2{x2)^<^l 

r IMS f 

>^^ibi\Pi)+ ^ X] / (4-20) 

i JXi i Jx, 

Notice that the semi-integrability of <pj w.r.t. 7 * yields (pj( 7 r*(xi,X 2 )) > —00 for 7 - 
a.e. {xi,X 2 ) E X so that (pi(xi) +0 932 ( 3 ^ 2 ) = <^ 1 ( 3 ^ 1 ) + ‘^ 2 ( 3 ^ 2 ) and we can split the 
integral 

+CX) > y (^^<^i(xi)j d7 = ^ y <^i(xi)d7 = ^ y 9?i(a:i)d7i. □ 


Optimality conditions. If there exists a couple cp as in Proposition 14.41 such that 
(f’( 7 |/ii,/i 2 ) = IJ. 2 ) then all the above inequalities O4.20p should be identities so 

that we have 

= j Fi{Pi) d/ii, and j (^c(a;i, 0 : 2 ) - (<yCi(a;i) +0 <<^^ 2 ( 3 ^ 2 ))) d 7 = 0, 


and the second part of Lemma 12.61 yields 


Pi{xi) PoP 2 {x 2 ) = c(xi,X 2 ) 7 -a.e. in X, 

-pi G dFi{ai) (pi + 7i)-a.e. in Ai 
Pi = -{Fiy^ 7*-a.e. in A^., 
F°{ipi) = Fi{ 0 ) /i^-a.e. in A^,, 


(4.21a) 

(4.21b) 

(4.21c) 

(4.21d) 


where (Aj, A^., A..yJ is a Borel partition related to the Lebesgue decomposition of the 
couple ( 7 j,/ij) as in Lemma 12.31 We will show now that the existence of a couple p 
satisfying 


(^ = (<yCi,</?2) e B(Xi;M) X B(X 2;M), >-(P))'^, pi ®o P2 < c, 


(4.22) 


and the joint optimality conditions 14.21l is also sufficient to prove that a feasible 7 G M(X) 
is optimal. We emphasize that we do not need any integrability assumption on p. 

Theorem 4.6. Let 7 G M(X) with (P{'y\fii, fi 2 ) < 00; if there exists a couple ip as in 
04.221) which satisfies the joint optimality conditions 04.211) then 7 is optimal. 
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Proof. We want to repeat the calculations in 04.201) of Proposition 14.41 but now taking 
care of the integrability issues. We use a clever truncation argument of HJI, based on the 
maps 

r„ : M —)■ M, Tn{^p) := —n y if An, (4.23) 

combined with a corresponding approximations of the entropies P) given by 

Fi,n{r) := max ( 0 r - F*{(j))). (4.24) 

\<f>\<n 

Recalling 04.16p . it is not difficult to check that if ipi + 0 P 2 F Q we have 0 < T„( 9 ?i) + 
TniPi) t Pi + P 2 as n t cx), whereas ipi + 0 P 2 < 0 yields 0 > T„(<yCi) +T„(9?2) i Pi + P 2 - In 
particular if (p satishes 04.22p then Tn{pi) E B(,(Xj), Tn(pi) © Tn(p 2 ) < c, and Tn(pi) > 

— (Fi)'^ due to (Pi)oo — 0 and pi > —(Fi)'^. The boundedness of Tn(pi) and Proposition 

14.41 yield for every 7 G M(X) 

/ Far„(v>.)) dft- (4.25) 

i n, 

When (Fi)'^ < 00 , choosing n > (Fi)'^ so that Tn(pi) = p* = -(Pi)'oo 7 i^-a-e., and 
applying (ii) of the next Lemma 14.71 we obtain 

f FfiTnip^)) d/r, j (Fi,„(a,) + a,Tn{pi)) d/r, 

J Xi J Xi 

F,_„(a,)d^, + (F,Y^P(X,)+ [ r„(^,)d7,. 

JXi JXi 

and the same relation also holds when {Ff)'^ = +cx) since in this case 7 ^^ = 0. Summing 
up the two contributions we get 

<F{i\Pi,P2) > ^ i^*,n(©) d/Ti + (Fi)'^7^(Wi) j + (Tn{pi) © Tn{p2)^ d7. 

Applying Lemma ITTI (i) and the fact that ©o P 2 = c > 0 7 -a.e. by fl4.21al) . we can 
pass to the limit as n t" cxo by monotone convergence in the right-hand side, obtaining the 
desired optimality <F{'y\fii, ^ 2 ) > <F{'y\pi, ^ 2 )- D 


Lemma 4.7. Let Fi^n ■ [0, cxd) -A [0, cx)) be defined by fl4.24p . Then 

(i) Fi^ri OjVe Lipschitz, Fi^^is) < Ffis), and Fj^„(s) f Ffis) as nf + 00 . 

(ii) For every s G Dom(Fj) and G M U {+ 00 } we have 

-Pi G dFfis) ^ -Tn{pi) G dFi^ri{s), 

= +CX), s = 0 ^ Fi^n{^) = F°{Tn{pi)) = F°{n). 

In particular, both cases considered in fl4.26p give F°{Tn{pi)) = Fi^n{s) + sTn{pi). 
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Proof. Property (i): By fl2.22p and the definition in fl4.24p we get Fi^n < Fi. Since 
—F*(0) = inf Fj > 0 we see that are nonnegative. Recalling that F* are non¬ 
decreasing with Doni(Fj*) D (— 00 ,0] (see Section 12.3p we also get the upper bound 
Fj„(s) < ns — F*{—n). Eventually, fl4.24p defines Fi^n as the maximum of a family 
of n-Lipschitz functions, so is n-Lipschitz. 

Property (ii); Notice that Fj.n = {F* + I[-n,n])* so that (Fi^n)* = F* + l[-n,n] > F*. It 
is not difficult to check that Fj(s) = Fj^„(s) if and only if dFi{s) fl [—n,n] 7 ^ 0. Therefore 
the set In ■= {s > 0 : Fj(s) = Fj^„(s)} is a nonempty closed interval (possibly reduced to 
a single point) and it is easy to see that denoting sf := maxJ„, s~ := min/„, Tf{s) := 
s~ y s A sf, we have Fj^„(s) = Fi{Tf{s)) -|- n{s — Tf{s)). In particular, whenever s > sf 
we have n G dFi^n{s) and similarly —n G dFi^n{s) if s < s“. If s belongs to the interior of 
4, then dFi{s) = dFi^n{s) C [-n,n]. 

Therefore, if <pi = —ipi G dFi{s) with 4 G [—n,n], we have Fj(s) = — F*{(f)i) = 

Fi^n{s) so that (pi G dFi^n{s). On the other hand, if dFi{s) 3 pi > n, then s cannot belong 
to the interior of In, so that by monotonicity s > sf and dFi^n{s) 3 n = Tn{pi) = —Tn{(pi). 
The case when dFi{s) 3 pi < —n is completely analogous. 

Eventually, if pi = —00 and s = 0 (in particular Fi(0) = F*{—oo) < 00 ), then fl4.24p 
and the fact that F* is nondecreasing yields Fj^„(0) = —F*{—n) = F°{n) = 4°(^n(9?i))- 
For the last statement in (ii) the case Tn{(pi) = (pi is trivial. For ip > n we have 
—n G dFi^n{.s) implying Fj^„(s) -|- F*{—n) = —ns. Hence, we have 

F*{Tn{(pi)) = -F*{-n) = Fi^n{s) + ns = 4,„(s) sTn{(pi). 

The case (pi < —n is similar. □ 

4.3 A general duality result 

The aim of this section is to show in complete generality the duality result ET = D, 
by using the (^^-formulation of the dual problem fl4.12p . which is equivalent to fl4.7p by 
Proposition 14.31 

We start with a simple lemma depending on a specific feature of the entropy functions 
(which fails exactly in the case of pure transport problems, see Example E.3 of Section 
13.3p . using the strengthened feasibility condition in 03.141) . First note that the couple 
ipi = 0 provides an obvious lower bound for D(/ii,/i 2 ), viz. 

D(hl,h2) > .^(0,0|/ii,/i2) = rui ini Fi. (4.27) 

i i 

We derive an upper and lower bound for the potential (pi under the assumption that 
c is bounded. 

Lemma 4.8. Let mi = npXi) and assume int (miDom(Fi)) nm 2 Dom(F 2 ) 7 ^ 0, so that 

3sj”,s5'" G Dom(Fi), S 2 G Dom(F 2 ) : misf < m 2 S 2 < rnisf, (4.28) 

and S := supc < 00 . Then every couple (p = ((^ 1 ,(^ 2 ) G with H 2 ) > 

Jf^i Fi satisfies 

< $+, ;= -^ 1 ) + ^2{F2iS2) - juf F 2 ) + m2S2>S _ 

“ “ ’ m2S2 - misf 

(4.29) 
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Proof. Since (p = {ipi, ip2) ^ ^ satisfies sup pi + sup ip2 < S, the definition of ^ in fl4.10p 
and the monotonicity of F° yield 

^miinf Fi < S>{p>\pi,p 2 ) < "iiF^°(sup <pi) + m 2 Ff{S - sup<pi) 

i 

Using the dual bound F°{ipi) < piSi + Fj(sj) for Sj G Dom(Fj) (cf. fl4.9p i now implies 

^mUnf Fi < ^((^|pi,/i 2 ) < (rrtiSi - 171282) sup<pi + miFi(si) + 7722^2(52) + 1712828. 
i 

Exploiting 04.281) . the choice si := shows the upper bound in 04.29^ : and si = s]*" the 
lower bound. □ 

We improve the previous result by showing that in the case of bounded cost functions 
it is sufficient to consider bounded potentials pi. The second lemma is well known in the 
case of Optimal Transport problems and will provide a useful a priori estimate in the case 
of bounded cost functions used in the proof of Theorem 14.111 

Lemma 4 . 9 . //supc = S < 00 then for every couple G $ there exists G $ such 
that pi2) > P2) and 


sup (fi — inf <fi<S, 0 < sup (fi + sup (f 2 < S. (4.30) 

If moreover 03.141) holds, than there exist a constant (pmax > 0 only depending on Fi,mi, S 
such that 

- V^max < inf (fi < sup (pi < (Pmax- (4.31) 

Proof. Since c > 0, possibly replacing (pi with (fi := pi V (—sup<p 2 ) we obtain a new 
couple ((^ 1 ,(^ 2 ) with 

> V^i, Pi{xi) + (f 2 ( 0 : 2 ) < (piixi) + ^ 2 ( 0 : 2 )) A 0 < c(xi, X 2 ) 

so that ((fi,f> 2 ) £ ^ and f> 2 lPi, ^ 2 ) > ■^(‘^ 1 ,‘^2|/7-i,/U- 2 ) since Ff is nondecreasing. 

It is then not restrictive to assume that inf (fi>— sup f> 2 ', a similar argument shows that 
we can assume inf <^2 > — sup f>i. Since 

sup f>i + sup (p 2 8 8 (4.32) 

we thus obtain a new couple (<pi, (^ 2 ) £ S with 

^((pi,(p 2 |/ii,/i 2 ) > ■^(v5i,V52|hi,h2), sup(pi - inf (pi < S'. (4.33) 

If moreover sup (pi +sup (p 2 = —6 < 0, we could always add the constant 6 to, e.g., (pi, thus 
increasing the value of ^ still preserving the constraint Thus, O4.30p is established. 

When 03.141) holds (e.g. in the case considered by 04.28^ 1 the previous Lemma 14.81 
provides constants (pf such that (pf < sup<,Pi < (pf. Now, O4.30p shows that (pf < 
sup<,P 2 < pt with ipf := —ipf and ipf) := 8 — ipf . Applying 04.301) once again, we obtain 
04.311) with 9?max := S' + - ipf. □ 
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Before stating the last lemma we recall the useful notion of c-transforms of functions 
: Xj —)■ R for a real valued cost c ; X —)■ [0, oo), dehned via 

<^ 1 (^ 2 ):= inf (c{x,X 2 ) - and ^pl{xi) := inf (c(xi, x) - (^ 2 ( 2 ;)). (4.34) 

It is well known that if (yCi © (/?2 < c with sup (pi < 00 then 

ifl and ip 2 bounded, © <yCi < c, > pi, and > p 2 - (4.35) 

Moreover, pi = p’^ if and only if pi = p\ for some function pp-, in this case p\ is called 
c-concave and {p'f, p’^ is a couple of c-concave potentials. 

Since F° are nondecreasing, it is also clear that whenever p‘^, p\ are /ij-nieasurable 
we have the estimate 

^((<^i,<^ 2 )|hnAi 2 ) < ■^((</5r><^2)l/^nAi2) e B(Xi) x 3 (^ 2 ), pi®p 2 <c. (4.36) 

The next lemma concerns the lower semicontinuity of p^ in the case when c is simple 
(cf. [23]), i.e. it has the form 

N 

c = CnXAixAi -, with Cn > 0 and A\ open in Xj. (4.37) 

n=l 

Lemma 4.10. Let us assume thatc has the form 04.371) and that p G Bs(Xi) xBs(X 2 ) is a 
couple of simple functions taking values in Dom(F°) xDom(F 2 °) and satisfying pi®p 2 < c. 
Then {pf,Pi) G $ with ^{{pf,p^)\pi,p 2 ) > ^{p\pi,p 2 )- 

Proof. It is easy to check that p’f^, p\ are simple, since the inhma in 04.341) are taken on 
a hnite number of possible values. By 04.35p it is thus sufficient to check that they are 
lower semicontinuous functions. 

We do this for p\, the argument for p’f = (vj^)'^ is completely analogous. For this, 
consider the sets 

Z := {z = G {0,1}^ : 3|/ G Xi Vn = 1,..., X : (l/)}, 

W := {2/ G Xi : \fn = l,...,N : ( 2 /) = z„}. 

Clearly, (W)zez dehnes a Borel partition of Xi; we dehne Pz := sup{</?i( 2 /) : y G Y^}. 

By construction, for every z E Z and y E the map /z(x) := c{y,x) — Pz is 
independent of y in Yz and it is lower semicontinuous w.r.t. x E X 2 since c is lower 
semicontinuous. Since p\{x 2 ) is the minimum of a hnite collection of lower semicontinuous 
functions, viz. 

p’{{x 2 ) = min {/z(x 2 ) : z G Z} (4.38) 

we obtain p\ E LSC(Xi). □ 

With all these auxiliary results at hand, we are now ready to prove our main result 
concerning the dual representation using Theorem 12.41 
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Theorem 4.11. In the basic coercive setting of Section \3.1\ (i.e. fl3.2al) or fl3.2b|) hold), 
the Entropy- Transport functional fl3.4j) and the dual functional fll.lOjl satisfy 


inf ^( 7 |/ii,/i 2 ) = sup ^(c^|pi,/i 2 ) for every /ij G 

■y&M{XixX2) 


(4.39) 


i.e. ET(/ii,/i2) = D(/ii,/i2) for every e M(Xi). 

Proof. Since ET > D is obvious, it suffices to show ET < D. In particular, it is not 
restrictive to assume that D(/ii,p 2 ) is hnite. We proceed in various steps, considering 
hrst the case when c has compact sublevels. We will assume that = +cx) (so that 

F° are continuous and increasing on M, and Ffoip^ g LSCb(Xj) whenever ipi G LSCf,(Xj)), 
and we will remove the compactness assumption on the sublevels of c in the following steps. 
Step 1 : The cost c has compact sublevels. We can directly apply Theorem 12.41 to the 
saddle functional of fl4.3l) by choosing A = M given by fl4.4p endowed with the narrow 
topology and B = ^. Conditions fl2.9ap and fl2.9bp are clearly satished and the coercivity 
assumption {Fi)'^ + {F 2 y^ + mine > 0 shows that we can choose = (- 01 ,- 02 ) with 
constant functions Vij and —i?*('0i) = —(fi = 0* G [0, (T))(,q] such that 


D = min ( c 


- {ifi © ip 2 )^ = 01 + 02 + min c > 0, 0^ > 


-oo. 


Arguing as in the proof of Theorem 13.31 (ii) we immediately see that (12.111) is satished, 
since 

= J (c - mine) d 7 + Dj{X) + 0 ^/ 10 W)- 

In fact, for C sufficiently big, the sublevels {7 G M : .if( 7 , 0^) < C} are closed, bounded 
(since H > 0 ) and equally tight (by the compactness of the sublevels of c), thus narrowly 
compact. Thus, (14.391) . i.e. ET = D, follows from Theorem 12.41 

Step 2 : The case when p.i have compact support, (I3.14p holds and the cost c is simple, 
i.e. (14.371) holds. Let us set W := supp(/Xi). Since {Fi)'^ = +cxo the support of all 7 with 
P 2 ) < 00 is contained Xi x X 2 so that the minimum of the functional (^( 7 |/ii, /i 2 ) 
does not change by restricting the spaces to X^. By applying the previous step to the 
problem stated in Xi x X 2 , for every E < ET(/ii,p 2 ) we hnd Lp G LSCs(Wi) x LSC<i(X 2 ) 
such that © V92 < c in Xi X X 2 , that F°{ipi) is hnite, and that Jx 

Extending pi to — supc in Xi \ Xi the value of 10 . 2 ) does not change and we 

obtain a couple of simple Borel functions with <^i©v 52 < c in X. We can eventually apply 
Lemma I4.1UI to hnd G $ with ^((p0, /i 2 ) > E. Since E < ET(/ii,/i 2 ) was 

arbitrary, we conclude that (I4.39p holds in this case as well. 

Step 3 : We remove the assumption on the compactness o/supp(/ii). 

Since pi are Radon, we can hnd two sequences of compact sets Ki^n C Xi such that 
£i,n ■= Ti{Xi \ Ki^n) -)■ 0 as rn- CX), i.e. pi^n ■= XKi^„ ■ Pi is narrowly converging to /i*. 

Let En := ET(/ii_„, p 2 ,n) and let E'^ < E^ with lim^^^oo= hminf^^ooSince 
Pi^n have compact support, by the previous step and Lemma |4]9] we can hnd a sequence 
and a constant (pmax independent of n such that 

> K and sup \pW < (Pmax- 
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This yields 


/^2) ^ ^ / Fi iSPi,n) d/^i + ^ ^ ( 9^max)^i,n ^ ^ ( <;^max)^i,n- 

i ''^i,n i i 

Using the lower semicontinuity of ET from Lemma [3.91 we obtain 


D(/Ui,/i2) > liminf > Hm E'^ = liminf ET(/ii,„,/i2,„) > EL(/ii,/i2). 

n—)-oo n—)-oo n^oo 

Thus, 04.391) is established. 

Step 4: fUe remove the assumption 03.141) on F^. It is sufficient to approximate Fi by an 
increasing and pointwise converging sequence F” G r(R+). The corresponding sequence 
(F”)° : ifi I—)■ supg>Q(F"(s) + s<y5j) of conjugate concave functions is also nondecreasing and 
pointwise converging to F°. By the previous step, if F„ < ET”(/ii, /i 2 ) with lim^^oo E"’ = 
lim„^oo ET"(/ii,/i 2 ) = ET(/ii,/i 2 ) we can find G such that 

En<Y.f = 


Passing to the limit n —)■ cxo we conclude ET(/ii,p 2 ) < D(hi 5 /^ 2 ) as desired. 

Step 5 : the case of a general cost c. 

Let c ; X —)■ [0, oo] be an arbitrary l.s.c. cost and let us denote by (c^jagA the class 
of costs characterized by 04.37P and majorized by c. Then, A is a directed set with the 
pointwise order <, since maxima of a finite number of cost functions in A can still be 
expressed as in fl4.37p . It is not difficult to check that c = sup^g^^c" = limagAC® so that 
by Lemma 123] ET(/ii,/i 2 ) = hniagA ET"(/ii,/i 2 ) = sup^gA ET“(/ii,/i 2 ), where ET“ denotes 
the Entropy-Transport functional associated to c“. 

Thus for every F < ET(/ii,/i2) we can find a G A such that ET“(/ii,/i2) > F and 
therefore, by the previous step, a couple G LSCs(Xi) x LSCs(X 2 ) with F°{ipf) finite 
such that cpf ® (P2 ^ c“ in X and /i2) > F. Since c“ < c we have G and 

ET(pi,/i 2 ) < D(pi,/i 2 ) follows. □ 

Arguing as in Remark 1231 we can change the spaces of test potentials (p = {ipi, 1 P 2 ) £ ^5 
see 04.111) . 

Corollary 4.12. The duality formula 04.391) still holds if we replace the spaces of simple 
lower semicontinuous functions LSCs{Xi) in the definition o/4» with the spaces of bounded 
lower semicontinuous functions LSC 6 (Xj) or with the spaces of bounded Borel functions 
Bb(X,). 

If (Xj, Ti) are completely regular spaces, then we can eguivalently replace lower semi¬ 
continuous functions by continuous ones, obtaining 


ET(pi,/i 2 ) = sup I ^ / E°{ipi) dpi : pi, E°{ipi) G C 6 (Xi), © <^2 < c 


'Xi 


sup { ^ j fiidpi : G Cb(Xi), Rlifii) © RIM < c|. 


(4.40) 
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Corollary 4.13 (Subadditivity of ET). The functional ET is convex and positively 1- 
homogeneous (in particular it is suhadditive), i.e. for every Hi, la'i G M(X) and \ >Q we 
have 

ET(AyUi, XH 2 ) = A ET(/ii,/i2), CT(/ii + + H' 2 ) < CT(/ii,/i2) + EY{hi^h'2)- (4-41) 

Proof. By Theorem 14.111 it is sufficient to prove the corresponding property of D, which 
follows immediately from its representation formula fll.Sp as a supremum of linear func¬ 
tionals. □ 

4.4 Existence of optimal Entropy-Kantorovich potentials 

In this section we will consider two cases, when the dual problem admits a couple of 
optimal Entropy-Kantorovich potentials if = {ipi,ip 2 )- 

The hrst case is completely analogous to the transport setting. 

Theorem 4.14. Consider complete metric spaces (Xj,dj), i = 1,2, and assume that 
03.141) holds, and c is bounded and uniformly continuous with respect to the product dis¬ 
tance d((a:i,a: 2 ), {,x'ix' 2 )) := in X = Xi x X 2 . Then there exists a couple of 

optimal Entropy-Kantorovich potentials ip G C6(Xi) x Cb{X 2 ) satisfying 


Ti®T2<c, EV{hi,H 2) = ^ip\Hi,H2)- 


(4.42) 


Proof. By the bonndedness and nniform continnity of c we can find a continnous and 
concave modnlns of continuity u : [0, -|-oo) —)■ [0, -|-oo) with a;(0) = 0 such that 

\c{x[,X2) - c{xi,X2)\ < Ci;(di(a:i,a:i)), |c(xi,0:2) - c(a:i,X2)| < Ci;(d2(x2,X2)). 

Possibly replacing the distances d* with dj -|- Ci;(dj), we may assnme that xi c(a:i,a: 2 ) 
is 1-Lipschitz w.r.t. di for every X 2 G X 2 and X 2 c(xi,X 2 ) is 1-Lipschitz with respect 
to d 2 for every Xi G Xi. In particular, every c-transform 04.34p of a bounded function is 
1-Lipschitz (and in particular Borel). 

Let Pn be a maximizing sequence in 4». By Lemma 14.91 we can assnme that is 
uniformly bounded; by 04.35p and 04.361) we can also assnme that are c-concave and 
thus 1-Lipschitz. If Ki^n is a family of compact sets whose union Ai has a full pi measure 
in Xi, we can thus extract a subsequence (still denoted by (p^,^) pointwise convergent to 
ip = {pi, P 2 ) in X A 2 . Obviously, we have pi := lim,^^oo Pi,n and p 2 '■= liminf,,^oo T 2 ,n, 
we obtain a family pi G Bb(Xj), and pi (B P 2 ^ c, pi > {Fi)'^ and 



thanks to Eaton’s Lemma and the fact that F°{pi^n) are nniformly bonnded from above. 
Eventnally replacing (pi,p 2 ) with (pl'^,pl) we obtain a conple in C;,(Xi) x Cb{X 2 ) satis¬ 


fying 04.42p . 


□ 


The next resnlt is of different type, since it does not reqnire any bonndedness nor 
regnlarity of c (which can also assnme the valne -|-oo in the case Ej(0) < 00 ). 
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Theorem 4.15. Let us suppose that at least one of the following two conditions hold: 
a) c is everywhere finite and fl3.14jl holds 


or 


b) Fj(0) < +CO. 

Then a plan 7 G M(X) with finite energy ^( 7 |/ii, fi 2 ) < 00 is optimal if and only if there 
exists a couple (p as in fl4.22p satisfying the optimality conditions fl4.2ip . 


Proof. We already proved (Theorem 14.61) that the existence of a couple (p as in fl4.22p 
satisfying fl4.2ip yields the optimality of 7 . 

Let us now assume that 7 G M(W) has hnite energy and is optimal. If pi = Pq then 
also 7 = 0 and fl4.2ip are always satished, since we can choose pi = 0. 

We can therefore assume that at least one of the measures /Xj, say /i 2 , has positive 
mass. Let 7 G Opt^(yUi,/i 2 ), and let us apply Theorem 14.111 to hnd a maximizing 
sequence G $ such that lim„^oo /i 2 ) = ET(yUi,/i 2 ). 

Using the Borel partitions (A,, A.^.) for the couples of measures 7 *, pi provided by 

Lemma 12.31 and arguing as in Proposition 14.41 we get 


lim 


^JXixX2 

lim / 

JAiUA 


(^C{XI,X2) - Pl,n{Xi) - P2,n{x2)^ d7 = 0, 

T ^iPi,n d/ij 0 , 

{Ti,n + (^i)'oo) = 0- 


lim 

n^oo 


Since all the integrands are nonnegative, up to selecting a suitable subsequence (not 
relabeled) we can assume that the integrands are converging pointwise a.e. to 0. We can 
thus hnd Borel sets A'- C Ai,A'^, C A^..,A'.^, C A^. and A' <Z X with 7r*(W) = A' U A'^.., 

{Pi + li) ((A \ 41') U {Af,^ \ U (41^, \ 4i:^j) = 0, and 7 (X \ 4l') = 0 such that 

c(a;i,a; 2 ) < oo lim c(a;i,X 2 ) - pi,n{xi) - P 2 ,n{^ 2 ) = 0 in A', (4.43) 

n—^oo 

Fi{ai) < oo, lim Fi{ai) + aiPi^n - F°{pi^ri) = 0 in A' U A' (4.44) 

n^oo ^ 

lim + (^i)'oo) =0 in A! (4.45) 

For every Xi G Xj we dehne the Borel functions pi{xi) := limsup^^^o</?i^„(a;i) and 
P 2 {.X 2 ) := lim inf 71 ( 3 ^ 2)5 taking values in M U {±cxo}. It is clear that the couple 
p = {pi,p 2 ) complies with fl4.22p . fl4.21dD and fl4.21cp . 

If 7 (X) = 0 then fl4.21al) and fl4.21bD are trivially satished, so that it is not restrictive 
to assume 7 (X) > 0. 

If pi{Xi) = 0 then (Fi)'^ is hnite (since 75 *“ (Xi) = 71 (Xi) = 7 (X) > 0) and pi = 
i^iYoo ^71 on A'. It follows that (^ 2 ( 2 ^ 2 ) = c(xi,a: 2 ) — {Fi)'^ G M on W so that 
fl4.21ap is satished. Since (^ 2 ( 2 ^ 2 ) is an accumulation point of P 2 ,nix 2 ) Lemma 14. 191 below 
yields —p 2 {x 2 ) G dF 2 {a 2 {x 2 )) in A '2 so that fl4.21bp is also satished (in the case i = 1 one 
can choose = 0 ). 

We can thus assume that /ii(Xj) > 0 and 7 (X) > 0. In order to check fl4.21ap and 
fl4.21bD we distinguish two cases. 
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Case a: c is everywhere finite and fl3.14p holds. Let us first prove that cpi < +cxo 
everywhere. 

By contradiction, if there is a point xi G Xi such that (pi(xi) = +oo we deduce that 
<^2(3^2) = — C)0 for every X2 E X2. 

Since the set A'2 U has positive /i2-measure, it contains some point X2'. Equa¬ 
tion 04 . 441 ) and Lemma 14.191 below (with F = F2, s = a2{x2), fin ■= —^2,n(x2)) yield sj = 
maxDom(F2) = cr2(x2) < 00 and <72 = in A'2 U A'^^. We thus have Dom(F2) C [ 0 ,s^], 
{F2)'^ = -l-cxo and therefore 171282 = 7 (^)- 

On the other hand, if </?2 = —cxo in X2 we deduce that (pfixi) = -|-cxo for every 
Xi G tt^{A'). Since {Fi)'^ > 0 , it follows that = 0 (i.e. = 0 ) so that there 

is a point Oi in such that (pfiai) = -|-cxo. Arguing as before, a further application of 
Lemma l4.191 yields that ai = s]” = minDom(Fi) pi-a.e. It follows that mis]" = 71 (Xi) = 
7(^) = 771252 , a situation that contradicts 03.14p . 

Since nfiXi) > 0 the same argument shows that <p 2 < C )0 everywhere in X 2 . It follows 
that 04. 2 lap holds and tpi > —00 on A'. Since y^fixi) is an accumulation point of y)i^n{.Xi), 
Lemma l4.191 below yields —^pfixfi G dFfiafixi)) in A' so that 04.21bp is also satished. 

Case b: Fi{0) < 00 . In this case F° are bounded from above and ipi > —{Fi)'^ 
everywhere in X^. By Theorem 14.111 lim^^.^ V- J F°dfii > — 00 , so that Eaton’s 
Lemma yields F^^ipi) G L^(Xi,/7i) and ipfixfi) > —00 for pi-a.e. xi G Xi, in particular 
for (/ii-|-7i)-a.e. Xi G A'^^. Applying Lemma 14.191 below , since crfixi) > 0 = minDom(Fi) 
in A'^, we deduce that —ipfixi) G dFi^afixi)) for (pi -|-7i)-a.e. Xi G A'^, i.e. fl4.21bp for 
7 = 1. Since we already checked that 04.21cp and 04.21dp hold, applying Lemma 12.61 fwith 
fi := —<pi and fi := F{’((pi))) we get (pi G L^(Xi,7i), in particular (pi o tt^ G M 7-a.e. in 
X. It follows that 04.21ap holds and (^2 o G L^(X,7) so that <^2 £ ® (h2 + 72)-a-e. in 
A2. A further application of Lemma [4.191 yields fl4.21bp for 7 = 2. □ 


Corollary 4.16. Let us suppose that Dom(Ti) D (0, 00) and Fi are differentiable in 
(0, 00). A plan 7 G M(X) with pi 2 ) < 00 belongs to Opt^(/7i,/i 2 ) if and only if 

there exist Borel partitions (A,, A^., A.^J and corresponding Borel densities Oi associated 
to 7 j and /ij as in Lemma \2.3[ such that setting 


1 

\ -Fl{aY) 

ifxi G Aj, 


Pi{xi) := < 

[-(FY'^ 

if Xi G A^., 

ifxi eXi \ {Ai U A^,), 

( 4 . 46 ) 

we have 

Pi ®oP2 < c in Xi X X2, pi®p 2 = 

c 7-a.e. in (Ai U Ag,J X (A2 U A.yY). 

( 4 . 47 ) 


Proof. Since dFfis) = {F/(s)} for every s G (0, 00) and F°{ipi) = FfiO) if and only 
if (fi G [—(Fj)g,-|-oo], 04.471) is clearly a necessary condition for optimality, thanks to 
Theorem 14.151 Since {Fif^ < F-'(s) < Theorem 14.61 shows that conditions 04.46p - 

04.47P are also sufficient. □ 

The next result shows that 04.46p - 04.47p take an even simpler form when —{F^Yq = 
{FiY^ = -|-cxo; in particular, by assuming that c is continuous, the support of an optimal 
plan 7 cannot be too small. 
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Corollary 4.17 (Spread of the support). Let us suppose that 

• c : X —)■ [ 0 , cxo] zs continuous. 

• Dom(Fj) D (0,cxo), Fi are differentiable in (0,cxo), and —(-Fi)o = 

Then, 7 is an optimal plan if and only if 7 * /ij, for every Xi G supp(/ii) we have 
c{xi,X 2 ) = +CX 0 if xi G supp/ii \supp 7 i or X 2 G supp/i 2 \supp 72 , and there exist Borel 
sets Ai C supp 7 i with ■jfXi \ At) = 0 and Borel densities ai : Ai ^ (0, cxo) of ■ji w.r.t. /ij 
such that 

F[{ai) © F 2 {(T 2 ) > —c in Ai x A 2 , F[{ai) © -^ 2 ( 0 - 2 ) = —c 7 -a.e. in Ai x ^ 42 . (4.48) 

Remark 4.18. Apart from the case of pure transport problems (Example E.3 of Section 
D, where the existence of Kantorovich potentials is well known (see [m Thm. 5.10]), 
Theorem 14.151 covers essentially all the interesting cases, at least when the cost c takes 
hnite values if 0 ^ Dom(Fj). In fact, if the strengthened feasibility condition fl3.14p does 
not hold, it is not difficult to construct an example of optimal plan 7 for which conditions 
(14.221) . fl4.21al) . (14.21bD cannot be satished. Consider e.g. Xi = M, c(a:i,a: 2 ) := \\xi — X 2 P, 
/ii := /i 2 := ^ Dom(F)i = [a, 1], Dom(F )2 = [1,6] with arbitrary 

choice of a G [0,1) and 6 G (1, cxo]. Since mi = m 2 = 1 the weak feasibility condition (13.ip 
holds, but (I3.14P is violated. We hnd 7 * = /i*, Uj = 1, so that the optimal plan 7 can be 
obtained by solving the quadratic optimal transportation problem, thus 7 := fjj/ii where 
t{x) := {x,x — l). In this case the potentials (fi are uniquely determined up to an additive 
constant a G M so that we have (pi(xi) = xi + a, (^ 2 ( 2 ^ 2 ) = —X 2 ~ 0 , ~ and it is clear 
that condition —(p* G dFi{l) corresponding to (I4.21bp cannot be satished, since dFi{l) 
are always proper subsets of M. We can also construct entropies such that dFfl) = 0 
(e.g. Fi{r) = (1 — r) log(l — r) + r, F 2 {r) = (r — 1) log(r — 1) — r + 2) so that (I4.21bp can 
never hold, independently of the cost c. □ 

We conclude this section by proving the simple property on snbdifferentials we used 
in the proof of Theorem 14.151 

Lemma 4.19. Let F G r(R_|_), s G Dom(F), let (p G MU{±cxd} be an accumulation point 
of a sequence ( 0 „) C M satisfying 

lim {F{s) - s(Pn + F*{(Pn)) = 0. (4.49) 

n^oo 

// 0 G M then (p G dF{s), if (p = +cxo then s = maxDom(F) and if (p = —00 then 
s = minDom(F). In particular, if s E int(Dom(F)) then (p is finite. 

Proof. Up to extracting a suitable subsequence, it is not restrictive to assume that cp is the 
limit of (pn SiS n ^ 00 . For every w G Dom(F) the Young inequality W(pn < F{w) + F*{(pn) 
yields 

hmsup(tc — s)(pn < lim sup F{w) —F{s)+ (f{s) — s(pn + F*{(pn) \ = F{w) — F{s) (4.50) 

n—^oo n —^00 ^ ' 
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If Dom(F) = {s} then dF{s) = R and there is nothing to prove; let thns assnme that 
Dom(F) has nonempty interior. 

If 0 G R then {w — s)0 < F{w) — F{s) for every w G Dom(F), so that (p G dF{s). 
Since the righthand side of fl4.50l) is hnite for every w G Dom(F), if 0 = +cxd then w < s 
for every w G Dom(F), so that s = maxDom(F). An analogons argument holds when 

0 = —OO. □ 

5 “Homogeneous” formulations of optimal Entropy- 
Transport problems 

Starting from the reverse formulation of the Entropy-Transport problem of Section 1X51 via 
the functional 3?, see 03.301) . in this section we will derive further equivalent representa¬ 
tions of the ET functional, which will also reveal new interesting properties, in particular 
when we will apply these results to the logarithmic Hellinger-Kantorovich functional. The 
advantage of the reverse formulation is that it always admits a “1-homogeneous” repre¬ 
sentation, associated to a modihed cost functional that can be explicitly computed in 
terms of Ri and c. 

We will always tacitly assume the basic coercive setting of Section 13.11 see (1^ . 

5.1 The homogeneous marginal perspective functional. 

First of all we introduce the marginal perspective function He depending on the parameter 
c > inf c; 

Definition 5.1 (Marginal perspective function and cost). For c G [0,oo),the marginal 
perspective function He : [0, oo) x [0, oo) —?> [0, -|-oo] is defined as the lower semicontinuous 
envelope of 

He{ri,r 2 ) := M9{Ri{ri/9) + R2{r2/9) + c) = MriFi{9/ri) + r2F2{9/r2) + 9c. (5.1) 
For c = oo we set 

hfoo(r’i, ra) := Fi(0)ri -F F2(0)r2. (5.2) 

The induced marginal perspective cost is H : (Xi x R+) x (X 2 x R_|_) —)■ [0, -|-oo] with 

H{xi,ri,X 2 ,r 2 ) := ilc(xi,a: 2 )(n, ^^ 2 ), forxi G W and Vi > 0. (5.3) 

The last formula fl5.2p is justihed by the property T)(0) = (Ri)'^ and the fact that 
ra) t Hoo{'f'i,r 2 ) as c f exo for every ri, ra G [0, exo), see also Lemma 15731 below . 

Example 5.2. Let us consider the symmetric cases associated to the entropies Up and 

I/: 

E.l In the “logarithmic entropy case”, which we will extensively study in Part II, we have 
Ffis) := Ui{s) = slogs — (s — 1) and Rfir) = Uo{r) = r — 1 — logr. 
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A direct computation shows 

Hc{ri,r2) = Hc{ri,r2) = ri + r2 - 2^ri r2 

= (v^- ^/^)^ + 2ra (l - 6"'=/^). 

E.2 For p = 0, Fj(s) = Uo{s) = s — logs — 1, and Ri{r) = Ui{r) we obtain 

Hc{ri,r 2 ) = Hc{ri,r 2 ) = n logn + r 2 logr 2 - (ri + ra) log 

E.3 In the power-like case with p G M \ {0,1} we start from 

1 


(5.4) 


(6.5) 


Fi(s) := t/p(s) = 


and obtain, for ri,r 2 > 0, 


p(p - 1) 


(sP-p(s-l)-l), i?,(r) = 17i_p(r) 


Hc{ri, ra) = hfc(ri, ra) = - (ri ra) 


ri ra 


2 - (p - l)c 


(5.6) 


pL' ' (^P-i + ^P-i)i/(p-i) 

where q = p/{p — !)• In fact, we have 

0{Ui-p{^) + Ui-p{^) + c) = _ I'j ^ ~ ~ 


= -(ri Ara) H-- 

p P — 1 


-y{rl ^ + rl 9j -{2-{p-l)c)9 


and fl5.6p follows by minimizing w.r.t. 9. E.g. when p = q = 2 

{(ri-r2f + h(c)rir\ (5.7) 




where h{c) = c(4 — c) if 0 < c < 2 and 4 if c > 2. For p = —1 and q = 1/2 equation 
05.61) yields 


Hcirur-i) = Hc{n,r 2 ) = J{rj + r|)(2 + 2c) - (ri + r^). 


(5.8) 


E.4 In the case of the total variation entropy E(s) = R{s) = |s — 1| we easily hnd 

Hc{ri,r 2 ) = Hc{ri,r 2 ) = ri + ra - (2 - c)+(ri A ra) = jra - ri| + (c A 2)(ri A ra). 

The following dual characterization of He nicely explains the crucial role of He- 

Lemma 5.3 (Dual characterization of HY- For every c > 0 the function admits the 
dual representation 


Hc{ri,r 2 ) = sup jriV'i + r2'02 : i>i e Dom(i?-), R^Yi) + < c| (5.9) 

= sup jriFi (pi) + r 2 Ff{ip 2 ) ■ Pi e Dom(T)°), pi + pa < c|. (5.10) 
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In particular it is lower semicontinuous, convex and positively 1-homogeneous (thus sub- 
linear) with respect to (ri,r 2 ), nondecreasing and concave w.r.t. c, and satisfies 

Hc{ri,r2) < H^{ri,r2) = E for every c > 0, r, > 0. (5-11) 


Moreover, 

a) the function He coincides with He in the interior of its domain; in particular, if 
Fj(0) < CO then Hc{ri,r 2 ) = He{si,r 2 ) whenever r 1 X 2 > 0. 

h) If + (^ 2)0 + c > 0 and (-^ 2 )'^ + (Fi)o + c > 0, then 

He{ri,r 2 ) = ^Fi{0)ri if r 1 X 2 = 0. (5.12) 


Proof. Since supDom(i?*) = Fj(0) by (12.321) . one immediately gets (15.9p in the case 
c = + 00 ; we can thus assume c < + 00 . 

It is not difficult to check that the function (ri, r 2 , 9) e-)> 9[Ri{ri/9) + R2{j'2/0) + c) is 
jointly convex in [0, 00 ) x [0, 00 ) x (0, 00 ) so that He is a convex and positive 1-homogeneous 
function. It is also proper (i.e. it is not identically -|-oo) thanks to (13.ip . By Legendre 
duality [381 Thm.12.2], its lower semicontinuous envelope is given by 

Hc{ri,r 2 ) = sup | : H(,{'ipi,'ip 2 ) < o|, (5.13) 


where 


/L*(V’i,'i/’ 2 ) = sup I --^c(ri,r 2 ) : Tj > o| = sup eRfiri/O)] 

I 

= sup(?^i) - 


-c9 


d>0 


c = 


ri>O,0>O ■ 

0 if RUf’i) < 00 , < c 

-l-cxo otherwise. 


In order to prove point a) it is sufficient to recall that convex functions are always continu¬ 
ous in the interior of their domain [38l Thm. 10.1]. In particular, since lim^io 9(Ri(ri/9)-h 
R2{r2/9) c) = YdiRiYooP = 'E^iFi{0)ri for every ri,r 2 > 0,we have Hc{ri,r 2 ) < 
^^Fj(0)rj, so that He is always hnite if Fi{0) < 00. 

Concerning b), it is obvious when ri = r 2 = 0. When ri > r 2 = 0, the facts that 
supDom(i?*) = FfiO), lim^-^F,(o)/2-(r) = -(F^)],, and inf F* = -(F^)'^ (see (| 2.32p ) yield 

Hein, 0) = sup jV'in : RlifJi) < c - inf Faj = Fi(0)ri. 

An analogous formula holds when 0 = ri < r 2 . □ 

A simple consequence of Lemma 15.31 and (I2.3ip is the lower bound 

Hein, r2) > Heiri,r2) > ^ V'iF for (-(pi, fii) G with (fi + (p2 < c. (5.14) 
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We now introduce the integral functional associated with the marginal perspective cost fl5.3p . 
which is based on the decomposition /Xj = Qi'ji + nf-. 

■= H{xi, gi{xi)]X 2 , g 2 {x 2 ))d'y+ (5.15) 

Jx i 

where we adopted the same notation as in fl3.29p . Let us hrst show that is always 
greater than 

Lemma 5 . 4 . For every 7 G M(X), G M(W), G G with 

= Qili + K, we have 


f H{xi,Qi{xi);x2,g2{x2))dj + '^Fi{0)n[{Xi)>^{(fi\iai,H2)- (5.16) 

Jx . 

Proof. Recalling that F°{ipi) = —F*{—(pi) > Fi{0) and using fl5.14p with rj = pj and 
^|Jj = F°{pj) we have 


Jx 

ilSalli 

> 


H{xi, gi{xi)-,X 2 , ^2(3^2)) d 7 + 

i 

Ff{(pi{xi))gi{xi) + Ff{p2{x2))Q2{x2)^ d7 + ^Fi(0)/i'(W) 

i 

= Y. f ^;°(v>i)£>ife)d7i + 5^^’.(0K(A'.) 

i i 

/* c c 

- + F°{pi)dpi = '^ F°{(pi)dpi = ^{ip\pi,p2)- 


Note that fl2.19p and 02.431) imply F°{pi) < Fi{0). □ 

An immediate consequence of the previous lemma is the following important result 
concerning the marginal perspective cost functional dehned by 05.151) . It can be nicely 
compared to the Reverse Entropy-Transport functional ^ for which Theorem 13.Ill stated 
^{pi,P2\l) = <Pij\pi,P2)- 

Theorem 5 . 5 . For every pi G M(Xj), 7 G M(X) and cp G ^ we have 

^{pi,P2\l) > J^ipi,p2h) > &{p>\pi,p2)- (5.17) 

In particular 

ET(/ii,/i 2 ) = H(/ii,/i 2 ) := min Jlf{pi,p 2 h), (5.18) 

7eM(X) 


and 7 G Opt^{pi, P2) if and only if it minimizes p2\-) in M(X) and satisfies 

H{xi,gi{xi);x2,g2{x2)) = '^Ri{gi{xi)) + c{xi,X2) j-a.e.inX, (5.19) 

i 
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where Qi is defined as in fl2.8p . If moreover the following conditions 


Fi( 0 ) = +CXO or there exists X 2 G X 2 with /i 2 ({a^ 2 }) = 0 , 
-^ 2 ( 0 ) = +00 or there exists xi G Xi with = 0, 

are satisfied, then 


(5.20) 


ET(/ii,/i 2 ) = min I y H{xi, gi{xi);X 2 , Q 2 {x 2 )) dj: j e M{X), Hi = (5.21) 


Proof. The inequality =^(/ii,/i 2 | 7 ) > ^(Aii,h 2 | 7 ) is an immediate consequence of the 
fact that ^2) + c > Hc{ri,r2) > Hc{ri,r2) for every rj,c G [ 0 , cxo], obtained 

by choosing 6 * = 1 in fl5.ip . The estimate fi 2 \j) > ^{.p\pii P 2 ) was shown in by 

Lemma 15.41 

By using the “reverse” formulation of ET(/ii, ^2) in terms of the functional /X 2 I 7 ) 

given by Theorem 13.111 and applying Theorem 14.111 we obtain fl5.18p and the characteri¬ 
zation fl5.19p . 

To establish the identity fl5.2ip we note that the difference to fl5.18l) only lies in drop¬ 
ping the additional restriction fif = 0. When both Fi{0) = ^ 2 ( 0 ) = -|-oo the equivalence 
is obvious since the hniteness of the functional 7 h-)■ J^(/ii, P 2 I 7 ) yields = fif = 0. 

In the general case, one immediately see that the righthand side E' of 05.211) (with “inf” 
instead of “min”) is larger than ET(/ii, /i 2 ), since the inhmum of J^(/ii, /i 2 |-) is constrained 
to the smaller set of plans 7 satisfying /ij 7 *. On the other hand, if 7 G Opt^(/ii,/i 2 ) 
with Hi = gifii + ni and fhi := /i^(Xj) > 0 , we can consider 7 := 7 + ® P 2 which 

satishes Hi ^ 7d by exploiting the fact that H{xi,ri]X 2 ,r 2 ) < by 05.lip , we 

obtain 


^(pi,/i2|7)= / H{xi,gi{xi)]X2,g2{x2))d'y + 
< 


H{xi, mp X2, m2) dni <8 H2 


mim2 Jx 

H{xi, pi(xi);x 2 , ^ 2 ( 2 : 2 )) d 7 ^Fi(0)mj = Jf'(/ii, P 2 I 7 ), 


so that we have E' < ET{hi, ^ 2 )- The case when only one (say H 2 ) of fbe measures 
ni vanishes can be treated in the same way: since in this case rhi = p]*-(Xi) > 0 and 
therefore T’i(O) < 00 , by applying 05.201) we can choose 7 := 7 + ® ^x 2 -i obtaining 

Jf'(pi,/i2|7) = / H{xx,gi{xifix2,g2{x2))d'f+^l H{xx,fhHX2,d) dni 
Jx mi 

< / //(xi, pi(xi);x2, ^ 2 (^ 2 )) d7-E Fi(0)mi = e^(/ii,P2|7)- □ 

Jx 


Remark 5.6. Notice that 05.201) is always satished if the spaces W are uncountable. If 
Xi is countable, one can always add an isolated point Xi (sometimes called “cemetery”) 
to Xi and consider the augmented space Xi = XiU {xi} obtained as the disjoint union of 
X and Xi, with augmented cost c which extends c to -|-cxo on Xi x X 2 \ (Xi x X 2 ). We 
can recover 05.211) by allowing 7 in M(Xi x X 2 ). □ 
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5.2 Entropy-transport problems with “homogeneous” marginal 
constraints 

In this section we will exploit the l-homogeneity of the marginal perspective fnnction 
in order to derive a last representation of the fnnctional ET, related to the new notion 
of homogeneous marginals. We will conhne our presentation to the basic, still relevant, 
facts, and we will devote the second part of the paper to develop a full theory for the 
specihc case of the Logarithmic Entropy-transport case. 

In particular, the following construction (typical in the Young measure approach to 
variational problems) allows us to consider the entropy-transport problems in a setting 
of greater generality. We replace a couple ( 7 , p), where 7 and g are a measure on X 
and a nonnegative Borel function, respectively, by a measure a G M(Y) on the extended 
space Y = Y x [0, 00 ). The original couple ( 7 , g) corresponds to measures a = (x, g{x))^^ 
concentrated on the graph of p in Y and whose hrst marginal is 7 . 


Homogeneous marginals. In the usual setting of Section [3Tl we consider the product 
spaces Yi := Xi x [ 0 , 00 ) endowed with the product topology and denote the generic points 
in Yi with = (xj, r,), x, G X^ and rj G [0, 00 ) for i = 1, 2. Projections from Y ;= Yi x Y 2 
onto the various coordinates will be denoted by 7 r^% with obvious meaning. 

For p > 0 and y E Y we will set \y\l^ := ^^^1 call Mp(Y) (resp. IPp(Y)) the 

space of measures a G M(Y) (resp. T(Y)) such that 


l^lpda < 00 . 


(5.22) 


If CK G Mp(Y) the measures rfa belong to M(Y), which allow us to dehne the “p- 
homogeneous” marginal hf(Q:) of ck G Mp(Y) as the Xj-marginal of rfcK, namely 

hf(a) - 7r“(rfa) € M(A'i). (6.23) 

The maps h^ : Mp(Y) —)■ M(Yj) are linear and invariant with respect to dilations: if 
'd -.Y ^ ( 0 , cxo) is a Borel map in 'LPiY^o.) and prd,j(y) := (xi, ri/'d(;y); X 2 ,'r 2 /'d(y)), we 
set 

dib,p(Q:) := (prd,,)}! (?9^Q:), i.e. 

J (p{y) d{dih^p{a)) = J (p{xi,ri/&, X2,r2/i^)'dP{y) da.{y) for <p G Bfe(Y). 

Using fl5.23p we obviously have 

hf(dib,p(Q:)) = hf(Q:). 

In particular, for a G Mp(Y) with cx.{Y) > 0, by choosing 


(5.24) 


(5.25) 


^{y) ■= 



(5.26a) 


if |?/|p = 0 , ^Jy > 

we obtain a rescaled probability measure dt with the same homogeneous marginals as cx 
and concentrated on Y^^^p := G Y : \y\p < r*} C (Y x [0,r*]) x (Y x [0,r*]): 

a = dib,p(a) G Tp(Y), hf(a) = hf(a), a(Y\Y,,p)=0. (5.26b) 
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Entropy-transport problems with prescribed homogeneons marginals. Given 
/ii,yU.2 G M(X) we now introdnce the convex sets 


:= {a G M,(r) : hf(a) < /i,}, 

?{£(/ii,/i 2 ) ;= {a G Mp(r) : hf(a) = /i,}. 

Clearly CK£^(/ii, yU,2) C CK<(/ii, yU,2) and theyare nonempty since plans of the form 


( 5 . 27 ) 


a. = 


P P 


hi ® 


h2 O 5a 


with oi, 02 > 0 


( 5 . 28 ) 


belong to ^2)- It is not difficnlt to check that !K<(/ii,p2) is also narrowly closed, 

while, on the contrary, this property fails for fh£(/ii, ^2) if hi(^i)h2(^2) 7^ 0 . To see this, 
it is snfficient to consider any a. G Tf£(/ii,p2) \ { 0 } and look at the vanishing seqnence 
dil„-pp(Q:) for n ^ 00. 

There is a natnral correspondence between [K<(/ii, /i2) (resp. TC£(/ii, /i2)) and TC<(/ii, /i2) 
(resp. CKi(pi,/i2)) indnced by the map Y 3 X2,r2) e-)- (xi, rf; X2, rf). For plans 

a G TC<(/ii,p2) we can prove a resnlt similar to Lemma [ 5.41 bnt now we obtain a linear 
fnnctional in a.. 


Lemma 5.7. For p G ( 0 , 00), pi G M(Xj), G and ct G F£<(/ii,p2) we have 

[ H{xi,rl;x2,r^)dcx + '^Fi{ 0 )p[{Xi) > ^{(p\pi,p2), where := pi-h^cx. ( 5 . 29 ) 
Jx 


Proof. The calcnlations are qnite similar to the proof of Lemma 15.41 
[ iL(xi,rf;x2,rf) da + V] Fi(0)p'(Xi) 

i 

(EH r / N 

> /y (n°(»’l(2^l))’'l+-f2(»’2(2^2))’'2)d“ + X]ll(0)K(V) 

^ Y. f r:{^,)d{h’’a) + YFmA(x,) 

i i 

Fn<Pi)<ma) + Y f rn‘Pi)diA = Yf f’n*2i)dft = ®(v2iA„fi2). □ 

As a conseqnence, we can characterize the entropy-transport minimnm via measnres 
a G M(r). 

Theorem 5.8. For every pi G M(Xj), p G ( 0 , 00) we have 

EY{pi,p2)= min f ( ^ i?i(rf)c(xi, X2)) da^ Fi(0)(yni - hf(Q:))(Xi) 

«ew5(w,M2) Jy V ^ ^ 

— 2 2 

( 5 . 30 ) 

= min [ H{xi,r^-,X2,r^)da. + '^Fi{ 0 ){pi-h.^{a)){Xi) ( 5 . 31 ) 

aex:|(w,M2) Jy “ 

= min / iL(xi, rf; X2, rf) da. ( 5 . 32 ) 

a&0{IL{fil,fi2) Jy 
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Moreover, for every plan 7 G Opt^(/ii,/i 2 ) (resp. optimal for flS.lSp or for fl5.2ip ) with 
h'i = Qili + the plo-'!^ OL := (xi, {xi)]X 2 , P 2 ^^(^ 2 ))tt 7 realizes the minimum of fl5.30p 
(resp. fl5.3ip or 05.321) ). 

Remark 5.9. When Fj(0) = +cxo 05.301) and 05.311) simply read as 

CT(hi,/^ 2 )= min / ( VRi(rf)+ c(xi,X 2 )) da 

r^; X 2 , rf) da. □ 

Proof of Theorem \5.8[ Let us denote by E' (resp. E", E"') the right-hand side of 05.301) 
(resp. of 05.3ip . 05.32p L where “min” has been replaced by “inf”. If 7 G M(X) and 
Pi = Qili + pi (in the case of 05.321) fif = 0) is the usual Lebesgue decomposition as in 
03.291) . we can consider the plan a ;= (xi, Pi'^^(a^i); a^ 2 , ^' 2 ^^(^ 2 ))tt 7 - 

Since the map ^ -G is Borel and takes values in a metrizable and 

separable space, it is Lusin 7 -measurable m Thm 5, p. 26], so that a is a Radon 
measure in M(l^). For every nonnegative (pi G Bb(W) we easily get 


mm 

,/J 2 ), 


0 i(xi)rf da 


Qi{xi)pi{xi) d7 


Qi^Pi b7i — 


pi dp.j. 


so that a G CR<(yU.i,/i2), hfa = 7*, and 

^{Pi,P 2 \l)= f RiiQiixi)) + c{xi,X 2 )'^ dj + '^Ei{0)fi(^{Xi) 

i i 

= j Ri{rf) + c{xi, X 2 )'^ doL +'^ Ei{0){iai - \p(a){Xi) > E'; 

i i 

taking the inhmum w.r.t. 7 and recalling 03.321) we get ET(pi, ^, 2 ) > E'. Since + 

c(xi,X 2 ) > iL(xi, rf; X 2 , rf) it is also clear that E' > E". 

On the other hand. Lemma [5?7I shows that E” > S>{<p}\pi, P 2 ) for every c/? G 
applying Theorem 14.111 we get ET(/ii,/i 2 ) = E' = E". 

Concerning E'" it is clear that E'" > E" = ET(/ii,/i 2 ); when O5.20p hold, by choosing 
a induced by a minimizer of 05.2ip we get the opposite inequality E"' < ET(/ii,/i 2 ). 

If O5.20p does not hold, we can still apply a slight modihcation of the argument at 
the end of the proof of Theorem 15.51 The only case to consider is when only one of the 
two measures frj- vanishes: just to £x the ideas, let us suppose that rhi = pij;{Xi) > 
0 = ^'^(^ 2 )- If 7 G Opt^(/ii,yU,2) and a is obtained as above, we can just set a ; = 
a -|- {fif X 61 ) X {u X (5o) for an arbitrary u G 7 {X 2 ). It is clear that hfa = fii and 


//(xi,rf;x 2 ,rf)da = / H{xi, gi{xi)-, X 2 , g2{x2)) dj + iL(xi, 1 ; X 2 , 0 ) d/i]^ (g) 12 


ilCTl 

< 


iL(xi,pi(xi);x2,P2(a;2))d7 + ^i(0)mi = Jf'(/ii, P2I7) = ^iPi,P2), 


which yields E'" < ET(/ii,/i 2 )- 


□ 
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Remark 5.10 (Rescaling invariance). By recalling fl5.26al bl and exploiting the l-homo- 
geneity of H it is not restrictive to solve the minimnm problem fl5.31jl in the smaller class 
of probability plans concentrated in 

Yr^p := {(a;i,ri;a; 2 ,r 2 ) eY : < r^}, 

i 

Notice that it is not restrictive to assume that cx.{{y E Y : \y\ = 0}) = 0 since 
H{xi, 0; X2, 0) = 0 for every Xi G Xi. □ 


Part II. The Logarithmic Entropy-Transport problem 
and the Hellinger-Kantorovich distance 

6 The Logarithmic Entropy-Transport (LET) prob¬ 
lem 

Starting from this section we will study a particular Entropy-Transport problem, whose 
structure reveals surprising properties. 


6.1 The metric setting for Logarithmic Entropy-Transport prob¬ 
lems. 


Let (X, r) be a Hausdorff topological space endowed with an extended distance function 
d : X X X — )■ [0, cxd] which is lower semicontinuous w.r.t. r; we refer to (X, r, d) as an 
extended metric-topological space. In the most common situations, d will take hnite 
values, (X, d) will be separable and complete and r will be the topology induced by d; 
nevertheless, there are interesting applications where nonseparable extended distances 
play an important role, so that it will be useful to deal with an auxiliary topology, see 

e.g. [31 m. 

From now on we suppose that Xi = X 2 = X, we choose the logarithmic entropies 


Fi{s) = Ui{s) := slogs - s+ 1, (6.1) 

and a cost c depending on the distance d through the function i : [0, 00] —)■ [0, 00] via 

log(l -I- tan^((i)) if d G [ 0 ,7r/2), 


:{xi,X 2 ) := i{d{xi,X 2 )), i{d) := 


00 


if d > 7r/2, 


so that 


j-log (cos^(d(a;i,X 2 ))) if d(xi, 0 : 2 ) < 7r/2 
C(^Xi, X2) S it, • 

-l-cxo otherwise. 


( 6 , 2 ) 


(6,3) 


Let us collect a few key properties that will be relevant in the sequel. 

LE.l Fi are superlinear, regular, strictly convex, with Dom(Fj) = [0, cxo), Ej(0) = 1, and 
(Fj)Q = —cxo. For s > 0 we have dFi{s) = {logs}. 
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LE.2 Ri{r) = rFi{l/r) = r - 1 - logr, i?i(0) = +cx), = 1. 

LE.3 -F*(0) = exp(0) — 1, F°{ip) = 1 — exp(—(p), Dom(F*) = Dom(F°) = M. 

LE.4 Rii'ijj) = — log(l — - 0 ) for '0 < 1 and R*{i>) = +cxo for ip >1. 

LE.5 The function ^ can be characterized as the unique solution of the differential equation 


/'(d) = 2 exp(£(d)), 

£(0) = /(O) 

= 0, 

(6.4) 

since it satishes 




f(d) = — fog (cos^(d)) = 2 ; 

.d 

i tan(s) ds, 

d G [0,7r/2), 

(6.5) 


Jo 


so that 

i{d)>(f, i'{d) = 2ta.nd > 2d, /'(d) = 2(1 + tafo(d)) = 2 exp(£((i)) > 2. (6.6) 

In particular i is strictly increasing and uniformly 2 -convex. It is not difficult to 
check that y/i is also convex: this property is equivalent to 211” > (/)^ and a direct 
calculation shows 

2W' — = 41 og(l + tan^(d))(l + tan^(d)) — 4 tan^(d) > 0 

since (1 + r) log(l + r) > r. 

LE .6 Hc{ri,r 2 ) = ri + r 2 — 2yjrir2 exp(—c/2) for c < oo, so that 

H{xi,ri]X2,r2) = n + r2 - 2 ^nr2 cos (6^/2(xi, 0:2)), ( 6 . 7 ) 

where we set 

da{xi,X 2 ) '■= d{xi,X 2 ) f\ a for Xj G X, a > 0 . ( 6 . 8 ) 

Since the function 

H{xi,rl] X 2 , rl) = rj + rj - 2 rir 2 cos(d^/2(xi, X2)) ( 6 . 9 ) 

will have an important geometric interpretation (see Section ITT]) , in the following 
we will choose the exponent p = 2 in the setting of Section 15.21 

We keep the usual notation X = X x X, identifying Xi and X 2 with X and letting the 
index i run between 1 and 2, e.g. for 7 G M(X) the marginals are denoted by 7 * = ( 7 r*)(j 7 . 

Problem 6.1 (The Logarithmic Entropy-Transport problem). Let {X, r, d) be an extended 
metric-topologieal space, i and c be as in fl 6 . 2 l) . Given pj G M(X) find 7 G M(X) 
minimizing 

LEr(/ii,/i 2 ) = min ( / (cTi loga* - a*-h l) dp*/ £(d(xi, X 2 )) d 7 

7eM(X) \ ^Jx Jx 

, dy* 

where ai = -—. 

d/Xi 

We denote by Opti£|-(pi, P2) the set of all the minimizers 7 in fl 6 . 10 p . 
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6.2 The Logarithmic Entropy-Transport problem: main results 

In the next theorem we collect the main properties of the Logarithmic Entropy-Transport 
(LET) problem relying on the reverse fnnction ^ from Section 13.51 cf. fl3.30p . and 
from Section Em cf. fl5.15p . 

Theorem 6.2 (Direct formulation of the LET problem). Let fii G M(X) be given and let 
i, d.n -/2 be defined as in fl 6 . 2 p and fl 6 . 8 p . 

a) Existence of optimal plans. There exists an optimal plan 7 G Opt^(pi,p 2 ) 
solving Problem 16'. il The set Opt^(pi,/i 2 ) is eonvex and compact in M(X), LET is a 
convex and positively 1-homogeneous functional (see fl4.4ip ) satisfying 0 < LET(/ii,/i 2 ) < 

b) Reverse formulation (UET = .^le)- The functional LET has the eguivalent reverse 
formulation as 

LET(/ii,p 2 ) = min |e^LE(hi,h 2 | 7 ) : 7 ^ M(X), Pi = where ( 6 . 11 ) 

^\e{pi,P2\i) ■.= ^{pi{X) + j (pi - 1 - logpi) d 7 ij-L y i{d{xi,X2)) dj, 

and 7 is an optimal plan in ^ 2 ) if and only if it minimizes 06.1111 . 

c) The homogeneous perspective formulation (LET = ^4^). The functional [EV {pi, fi 2 ) 
can be eguivalently characterized as 

LET(pi,/i 2 ) = min |j^(/ii,P 2 | 7 ) : 7 ^ where (6.12) 

^iE{Ti,T 2 h) ■=y^Li{X)-2 max / ^/gfixh)Q 2 {^cos{d^/ 2 {xi,X 2 )) dj 

^ 76 M(X) 

= + / {qi{xi)+Q2{x2)-‘2VQ 2 {x 2 ) cos{d^/ 2 {xi, X 2 ))) d7 

. Jx 

and'ji = QiPi + pf. Moreover, every plan G Opti£f-(/ii, P 2 ) provides a solution to 06.12p . 

Proof. The variational problem 06.10p hts in the class considered by Problem 13.11 in the 
basic coercive setting of Section [3Tl since the logarithmic entropy 06.ip is superlinear with 
domain [0, 00). The problem is always feasible since Di(0) = 1 so that 03.6p holds. 

a) follows by Theorem 13.3f ih the upper bound of LET is a particular case of 03.711 . and 
its convexity and 1-homogeneity follows by Corollary 14.131 

b) is a consequence of Theorem 13.111 

c) is an application of Theorem 15.51 and 06.7p . □ 

We consider now the dual representation of LET; recall that LSC 5 (X) denotes the space 
of simple (i.e. taking a hnite number of values) lower semicontinuous functions and for a 
couple 04 : X —)■ M the symbol © 02 denotes the function {xi,X 2 ) (j)i{xi) + ^ 2 ( 2 ^ 2 ) 
dehned in X. In part a) we relate to Section I121 whereas b)-d) discusses the optimality 
conditions from Section 14.41 
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Theorem 6.3 (Dual formulation and optimality conditions). 

a) The dual problem (LET = Dle = D^^). For all pi, fi 2 G M(X) we have 

LEr(/ii,/i 2 ) = sup |^i£(c^|/ii,yU 2 ) : e LSCs(X), <pi © <p 2 < ^(cl)|, (6.13) 

= sup / i/jidpi : i/ji e LSC^(X), supV'i < 1, 

X 

(1 --01 (a;i))(l -'02(3^2)) > cos^(d7r/2(a:i,a;2)) znxj, (6.14) 

where p) '■= Ix d/i*. The same identities hold if the space LSCs{X) 

is replaced by LSCf,(X) orBh{X) in fl6.13p and fl6.14p . When the topology r is completely 
regular (in particular when d is a distance and r is induced by d) the space LSCs(l^) can 
be replaced by Cb{X) as well. 

b) Optimality conditions. Let us assume that d is continuous. A plan 7 G M(X) is 
optimal if and only if its marginals 7 * are absolutely continuous w.r.t. pi, l{d) d 7 < 00 , 

d > 7 r /2 in ^(supp/ii\supp 7 i) xsupp/i 2 j (^supppiX (supp/i 2 \supp 72 ) (6.15) 

and there exist Borel sets Ai C supp 7 j with 7 j(X \ Ai) = 0 and Borel densities Uj : Aj ^ 
(0, cxo) of w.r.t. fii such that 

o'i(xi)cr 2 (a; 2 ) > cos^(d^/ 2 (a;i,X 2 )) in Ai x A 2 , (6.16) 

(Ti(a:i)(T 2 (x 2 ) = cos^(d^/ 2 (a:i,X 2 )) 'y-a.e. in Ai x A 2 . (6-17) 

c) i{d)-cyclical monotonicity. Every optimal plan 7 G Opt^£j-(/ii,yU 2 ) is a solution 
of the optimal transport problem with cost ^{d) between its marginals 7 ^. In particular it 
is i{d)- cyclically monotone, i.e. it is concentrated on a Borel set G G X (G = supp( 7 ) 
when d is continuous) such that for every choice o/(x", X 2 )(^=i C G and every permutation 

n()Li cos2(d^/2(a;’,^, x”)) > cos2(d^/2(x^, x^^'^^)). (6.18) 

d) Generalized potentials. If ^ is optimal and A^, Uj are defined as in b) above, the 

Borel potentials : X —)■ M 

{ - log (Ji in Ai, (1 - Oi in Ai, 

—00 mX\supp/ij, , fji := < —00 in X\su.pp Pi, (6.19) 
+00 otherwise, [^1 otherwise, 

satisfy (pi(BoT 2 < -^(d) and the optimality conditions fl4.2ip (with the analogous properties 
forfji). Moreover e~‘^\'ilji &lA{X, Pi) and 

iEV{pi, fi 2 ) = ^ [ (1 - d/ii = ^ iJidpi = ^yi{X) -2j{X). (6.20) 

Jx ,• Jx 
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Proof. Identity fl6.13p follows by Theorem 14.111 recalling the dehnition fld.lip of $ and 
the fact that F°{ip) = 1 — exp(—yj). 

Identity fl6.14p follows from Proposition 14.31 and the fact that = — log(l — fj). 

Notice that the dehnition fl4.7p of ensnres that we can restrict the snpremnm in fl6.14p 
to fnnctions ^|Ji with snpj^V'i < 1 - We have discnssed the possibility to replace LSCs(X) 
with LSCfe(X), Bfe(X) or C 6 (X) in Corollary 14.121 

The statement of point b) follows by Corollary 14.171 notice that a plan with hnite en¬ 
ergy /i 2 ) < oo always satishes J^£(d) < cx). Conversely, if the latter integrability 

property holds, fl6.17p and the fact that (log(Tj)_ dy^ = (Tj(log(Tj)_ d/i* < oo yields 
(f( 7 |/ii,/i 2 ) < oo. 

Point c) is an obvions conseqnence of the optimality of 7 . 

Point d) can be easily dednced by b) or by applying Theorem 14.151 □ 

In the one-dimensional case, the £(d)-cyclic monotonicity of part c) of the previons 
theorem rednces to classical monotonicity. 

Corollary 6.4 (Monotonicity of optimal plans in M). When X = R with the nsnal 
distance, the snpport of every optimal plan 7 is a monotone set, i.e. 

{xi,X 2 ), (a;i,a; 2 ) e snpp( 7 ), xi < x'l ^ X 2 < x'^. ( 6 - 21 ) 

Proof. As the fnnction i is nniformly convex, 06.181) is eqnivalent to monotonicity. □ 

The next result provides a variant of the reverse formulation in Theorem 16.21 

Corollary 6.5. For all fii, ^2 ^ M(X) we have 

LEr(/ii,/U 2 ) = ^/ii(X) - 2 max| 7 (X) : 7 G M(X), 7 = di/ii, (6.22) 

i 

ai{xi)a 2 {x 2 ) < cos^(d 72 (a:i,X 2 )) 7 -a.e. m x|. 

Proof. Let us denote by M' the right-hand side and let 7 G M(X) be a plan satisfying 
the conditions of 06.22p . If Ai are Borel sets with 7 i(X \ Ai) = 0 and 7 : X —)■ (0, 00 ) 
are Borel densities of 7 * w.r.t. /r*, we have Pi(xj) = 1 / 7 ( 7 ) in A so that 0 'i(xi)a 2 (x 2 ) < 
cos^( 672 ( 3 ^ 1 , 7 )) yields gi(xi)g 2 (x 2 ) cos^(dn/ 2 (xi,X 2 )) > 1. Since (logpi)+ G L^(X, 7 i) we 
have 

+ y (2* - 1 - log ft) d 7 ij -F J £(d(xi,X 2 )) dy 
= - 7i(^)) - f log {gi{xi)g2{x2)cos^{d^/2{xi,X2))) dy < ^^(X) - 27(X). 

i i 

By 06.lip we get M' > LET(fti,/i 2 ). On the other hand, choosing any 7 G Opt^£p(ftl,/i 2 ) 
the optimality condition 06.17P shows that 7 is an admissible competitor for 06.22p and 
06.201) shows that M' = LET(pi,/i 2 ). □ 
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The nonnegative and concave fnnctional (/ii,/i2) ~ LEl'(hi,/^2) can be 

represented as in the following eqnivalent ways: 

- LEr(pi,/i2) = 2 max / V Qi{xi)Q2{x2) cos(d^/2(xi, X2)) d7 (6.23) 

^ 76 M(X) Jx 

= inf I ^ dfii : Lfi G LSCs(X), (pi®(p 2 < -^(d)! (6.24) 


“(E/. 


dpi : 'tpi e USC5(X), inf 'ipi > 0, 




= 2 max 


iJi{xi)ip 2 {x 2 ) > cos^(d^/2(a;i,a;2)) in X 
|7(X) : 7 e M(X), 7i = (Ti/ii, 

(Ji{xi)a 2 {x 2 ) < cos^(d^/2(a;i,a;2)) 7-a.e. in X 


(6.25) 


(6.26) 


The next resnlt concerns uniqneness of the optimal plan 7 in the Euclidean case 
X = M'^. We will use the notion of approximate differential (denoted by D), see e.g. [H 
Def. 5.5.1]. 

Theorem 6.6 (Uniqueness). Let fii G M(X) and 7 G Opt^£j-(pi,/i2). 

(i) The marginals 7* = 7rj7 are uniquely determined. 

(a) If X = M with the usual distance then 7 is the unique element 0/Opt^£j-(/ii, p2)- 

(Hi) If X = with the usual distance, pi =2^'^ is absolutely continuous, and Ai C 
and ai : Ai ^ (0, cxo) are as in Theorem Id. 31 b), then ai is approximately differ¬ 
entiable at '^^-a.e. point of Ai and 7 is the unique element of p. 2 ); it is 

concentrated on the graph of a function t —)■ satisfying 

t{xi) =xi + ^ _lf)logcri(a:i) 71-a.e. in Ai. (6.27) 

Proof, (i) follows directly from Lemma [3.51 

(ii) follows by Theorem I6.3l c). since whenever the marginals 7^ are hxed there is only 
one plan with monotone support in M. 

In order to prove (hi) we adapt the argument of [H Thm. 6.2.4] to our singular setting, 
where the cost c can take the value +00. 

Let Ai C and ai : Ai ^ (0, 00) as in Theorem 16.31 bb Since pi = with 

density u G L^(M'^), up to removing a pi-negligible set (and thus 71-negligible) from Ai, 
it is not restrictive to assume that u{xi) > 0 everywhere in Ai, so that the classes of Af^^- 
and 7i-neghgible subsets of Ai coincide. For every n G N we dehne 

A 2 ,n ■= {X 2 G A 2 : a 2 {x 2 ) > 1/n}, Sn(a:i) := sup cos^(|a;i - X2 |)/ct2(2:2)- (6.28) 

X2&A2,n 

The functions Sn are bounded and Lipschitz in and therefore differentiable . 5 f'^-a.e. by 
Rademacher’s Theorem. Since 71 -C pi and pi is absolutely continuous w.r.t. we 
deduce that are differentiable 71-a.e. in A^. 
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By fl6.16p we have cri(a;i) > s„(xi) in Ai. By fl6.17p we know that for 71 -a.e. Xi G Ai 
there exists X2 G A2 snch that |xi — X2I < vr/2 and cti(xi) = cos^dxi — X2|)/cr2(x2) so that 
o'i(xi) = Sn( 2 :i) for n snfficiently big and hence the family {Bn)nm of sets Bn := {xi G 
Ai : cri(xi) > s„(xi)} is decreasing (since is increasing and dominated by ai) and has 
=Sf'^-neghgible intersection. 

It follows that 7 i-a.e. Xi G Ai is a point of =Sf'^-density 1 of {xi G Ai : cri(xi) = Sn( 3 ^i)} 
for some n G N and is differentiable at xi. Let us denote by A'^ the set of all xi G Ai 
such that (Ti is approximately differentiable at every xi G A'^ with approximate differential 
Dcri(a;i) equal to Dsji(a;i) for n sufficiently big. 

Suppose now that Xi G A'-^ and <Ji{xi) = cos^d^i — X 2 \)/ 0 ' 2 {x 2 ) for some X 2 G A 2 . 
Since by 06.161) and 06.171) the map x\ i-7 cos^da;'^ — X 2 1)/cxi(x'J attains its maximum at 
x'l = xi, we deduce that 

Xi — X 2 1 ~ , 

tan( xi - X 2 )-^-r = --Dlog(Ti(xi), 

F 1 -X 2 I 2 

so that X 2 is uniquely determined, and 06.271) follows. □ 

We conclude this section with the last representation formula for LET(/ii,/i 2 ) given in 
terms of transport plans a. mY ■=Y xY with Y ■= X x [0, 00) with constraints on the 
homogeneous marginals, keeping the notation of Section 15.21 Even if it seems the most 
complicated one, it will provide the natural point of view in order to study the metric 
properties of the LET functional. 

Theorem 6.7. For every fii G M(X) we have 


LEr(/ii,/i2) — — 2 


max 


ae3<|(Atl,At2) J X 


= mm 


= mm 


rir 2 cos(d^/ 2 (xi, X 2 )) do: 
rl + rj- 2 rir 2 cos(d^/ 2 (xi, X 2 )) j da + - h?a)(X) : 

i 

a G MdK), h^a < /ij| 
{r\ + ^2 - 2rir2 cos(d^/2(xi, X2)) j da : a G M(F), h^^a = 


(6.29) 

(6.30) 


(6.31) 


Moreover, for every plan 7 G Opt^£p/ilp ,2 and every couple of Borel densities Qi as in 
fl6.11D the plan a := (xi, y/pi(xi); X 2 , \/p2(^^)tt7 'Is optimal for fl6.30p and fl6.29p . 

Proof. Identity fl6.30p (resp. (I6.3ip l follows directly by fl5.3ip (resp. fl5.32p l of Theorem 
15.81 Relation fl6.29p is just a different form for fl6.3Up . □ 


7 The metric side of the LET-functional: 
the Hellinger-Kantorovich distance 

In this section we want to show that the functional 

(hl)h2) t p. 2 ) (7.1) 
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defines a distance in M(X), which is then called the Hellinger-Kantorovich distance and 
denoted HK. This distance property is strongly related to the property that the function 
{xi,ri; X 2 ,r 2 ) ^ rf; X2, r^)) is a (possibly extended) seniidistance in y = X x 

[0,oo). 

In the next section we will briefly study this function and the induced metric space, the 
so-called cone £ on X, P Sec. 3.6] obtained by taking the quotient w.r.t. the equivalent 
classes of points with distance 0. 

7.1 The cone construction 

In the extended metric-topological space (X, r, d) of Section [6Tl we will denote by ; = 
d A a the truncated distance and hj y = {x,r), x E X, r G [0, cxo), the generic points of 
y :=X X [0,cx)). 

It is not difficult to show that the function d^ : y x y —)■ [0, cxo) 

{X 2 , r 2 )) := - 2rir2 cos(d^(a;i, 0 : 2 )) (7.2) 

is nonnegative, symmetric, and satisfies the triangle inequality (see e.g. P Prop. 3.6.13]). 
We also notice that 


^£(2/1,2/2) = \ri -r 2 \^ -F4rir2 sin^ (d^(xi, a;2)/2), (7.3) 

which implies the useful estimates 

max (^|ri - r2|, ^v'nr^d^(a;i, X2) j < dt{yi,y 2 ) < \ri - r 2 \ + \/nr^d^(a;i, ^2). (7.4) 

From this it follows that d£ induces a true distance in the quotient space (£: = y/ ~ where 

2/1 ~ 2/2 ri = r2 = 0 or ri = r2, Xi = X 2 . (7.5) 

Equivalence classes are usually denoted by t) = [y] = [x,r], where the vertex [x, 0] plays 
a distinguished role. It is denoted by 0, its complement is the open set = C \ {0}. 
On (t we introduce a topology tc, which is in general weaker than the canonical quotient 
topology: r£ neighborhoods of points in €-0 coincide with neighborhoods in Y, whereas 
the sets 

{[x, r] : 0 < r < e} = {l) G C : d£(r), 0) < e}, e > 0, (7.6) 

provide a system of open neighborhoods of 0 . tc coincides with the quotient topology 
when X is compact. 

It is easy to check that (£, th) is a Hausdorff topological space and d^; is r£-lower 
semicontinuous. If r is induced by d then r£ is induced by d£. If (X, d) is complete 
(resp. separable), then (l2i, d£) is also complete (resp. separable). 

Perhaps the simplest example is provided by the unit sphere X = = {x G : 

|x| = 1} in endowed with the intrinsic Riemannian distance: the corresponding cone 
(t is precisely R"^. 

We denote the canonical projection by 

p : y —)■ (C, p(x,r) = [x,r]. (7.7) 
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Clearly p is continuous and is an homeomorphism between F \ (X x {0}) and Cio- A right 
inverse y ; £ — ?■ X of the map p can be obtained by hxing a point x G X and dehning 

{ X if T* 0 

and y:=(x, r). (7.8) 

X if r = 0, 

Notice that r is continuous and x is continuous restricted to Co. 

A continuous rescaling product from C x [0, oo) to C can be dehned by 


p-A : = 



if p = 0, 

if p = [x, r], s > 0. 


(7.9) 


We conclude this introductory section by a characterization of compact sets in (C, tc). 

Lemma 7.1 (Compact sets in C). A closed set K of (E is compact if and only if there is 
To > 0 such that its upper sections 


K{p) := {x G X : [x,r] G K for some r > p} 


are empty for p > r^ and compact in X for 0 < p < uq . 

Proof. It is easy to check that the condition is necessary. 

In order to show the sufficiency, let p = inf;^ r. If p > 0 then K is compact since it is 
a closed subset of the compact set p(X(p) x [p, ro]). 

If p = 0 then 0 is an accumulation point of K by fl7.6p and therefore o G X since K 
is closed. If ^ is an open covering of K, we can pick Uq E such that o G Uq. By fl7.6p 
there exists e > 0 such that K\Uo C p(X(£) x [£,ro]): since p(X(e) x [e,ro]) is compact, 
we can thus hud a hnite subcover {Ui, • • • , Un} C of X \ Uq. {Un}n=o therefore a 
hnite subcover of X. □ 


Remark 7.2 (Two different truncations). Notice that in the constitutive formula dehning 
dc we used the truncated distance with upper threshold vr, whereas in Theorem 16.71 
an analogous formula with d,r/2 and threshold 7r/2 played a crucial role. We could then 
consider the distance 

:= + r^ - 2rir2COs(d^/2(xi,X2)) (7.10a) 

= In - r2|^ + 4rir2 sin^(d^/2(xi, X2)/2) (7.10b) 

on €, which satishes 

dv2,c<dc<y2d,/2,£. (7.11) 

The notation fl7.10al) is justihed by the fact that 0,^/2,c is still a cone distance associated 
to the metric space (X, 0,^/2), since obviously (07^/2)77 = (d,r/2) A 7r/2 = 07^/2- From the 
geometric point of view, the choice of d^ is natural, since it preserves important met¬ 
ric properties concerning geodesics (see P, Thm. 3.6.17] and the next section [HUD 
curvature (see P Sect. 4.7] and the next section IH75P . 

On the other hand, the choice of d,r/2 is crucial for its link with the function H of fl6.9p . 
with Entropy-Transport problems, and with a representation property for the Hopf-Lax 
formula that we will see in the next sections. Notice that the 1-homogeneous formula fl6.7p 
would not be convex in (ri,r 2) if one uses d^^ instead of d7r/2- Nevertheless, we will prove 
in Section 17.31 the remarkable fact that both d.^- and d7r/2 will lead to the same distance 
between positive measures. □ 
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7.2 Radon measures in the cone and homogeneous marginals 

It is clear that any measure p G M(C) can be lifted to a measure p G M(y) such that 
pjji/ = z/: it is sufficient to take h = where y is a right inverse of p dehned as in fl7.8l) . 

We call M2(C) (resp. lP2(<i)) the space of measures u G (resp. u G 1P(C)) such 

that 

J du = j d|(p, o) di/= j r^dz/<oo, D = (7-12) 

Measures in M2(C) thus correspond to images of measures p G M2(h") and have 
finite second moment w.r.t. the distance d^, which justihes the index 2 in M2(€). Notice 
moreover that the measure does not charge X x {0} and it is independent of the 
choice of the point x in fl7.8p . 

The above considerations can be easily extended to plans in the product spaces 
(where typically N = 2 , but also the general case will turn out to be useful later on). To 
clarify the notation, we will denote by t) = (pi)di = a point in and we 

will set rj(t)) = r(pj) = r*, Xj(t)) = x(pj) G X. Projections on the f-coordinate from 
to € are usually denoted by tt* or p = p®^ : ^ (T®^, y = y®^ : C®^ ^ (F)®^ 

are the Cartesian products of the projections and of the lifts. 

Recall that the L^-Kantorovich-Wasserstein (extended) distance Wd^ in M2(C) induced 
by dc is dehned by 

Wd^z^i, 1 ^ 2 ) ■= min | J d^(pi, pa) da : a G M((t), vrf'a = (7.13) 

with the convention that Wdg.(z/i, z/2 ) = +cxd if z/i(£) 7^ z/2(C) and thus the minimum in 
(17.1311 is taken on an empty set. We want to mimic the above dehnition, replacing the 
usual marginal conditions in fl7.13p with the homogeneous marginals 1)^ which we are 
going to dehne. 

Let us consider now a plan a in M((t®'^) with a = y^a G M(F®'^): we say that a 
lies in M2(U:®^) if 

[ da = / rj da < 00. (7.14) 


Its “canonical” marginals in M(C) are a* = vr^'a, whereas the “homogeneous” marginals 
correspond to (15.2311 with p = 2\ 


l)i(a) := *= Zvt(.Y), a := y,a. 


(7.15) 


We will omit the index i when = 1. Notice that r^a does not charge (7r*)“^(o) (similarly, 
r‘fct does not charge x {(x, 0)} x F^^-*) so that (I7.15p is independent of the choice 

of the point x in (I7.8p . 

As for (15.2511 . the homogeneous marginals on the cone are invariant with respect to 
dilations: if : C®^ —)■ (0, cxo) is a Borel map in L^((t®'^, a) we set 


(pri«(tj))j := tji-p(t))) ‘ and di4,2(a) := (prd„),(!l^ a), 


(7.16) 


SO that 


lli (dil,?,2(Q:)) = f)j^(«) for every a G M2((t‘ 


®N\ 


(7.17) 
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As for the canonical marginals, a uniform control of the homogeneous marginals is suffi¬ 
cient to get equal tightness, cf. fl2.4jl for the dehnition. We state this result for an arbitrary- 
number of components, and we emphasize that we are not claiming any closedness of the 
involved sets. 

Lemma 7.3 (Homogeneous marginals and tightness). Let JCi, i = 1, ■ ■ ■ , N, be a finite 
collection of bounded and equally tight sets in M(X). Then, the set 

{ct G M2(C:^) : a fori = l,...,N] (7.18) 

is equally tight in M(€'^). 

Proof. By applying [21 Lem. 5.2.2], it is sufficient to consider the case iV = 1; given a 
bounded and equally tight set % C M(X) we prove that IK := {a G M2(C) : f}^Q: G X} 
is equally tight. For A (Z X, R C. (0, cxo) we will use the short notation A R for 
p(A X i?) C d. If A and R are compact, then A i? is compact in €. 

Let M := sup^g3(;p(X) < oo; since X is tight, we can hnd an increasing sequence of 
compact sets Kn C X such that fi{X \ Kn) < 8“"- for every p G X. For an integer m G N 
we then consider the compact sets Am C € dehned by 

CX) 

Am = {0}U Km X£ [2-^,2™] U ( IJ K^+m X£ [2"", 2-^+1]). (7.19) 

n=l 

Setting iFoo = U^i we have /i(X \ K^) = 0 and 

CX) 

(E\AmC Km X£(2™,CX))U (^\^{Kn+m\Kn+m-l) X £ (2""+\ CX))) U (X \ AToo) X£(0,cx). 

n=l 

Since for every o: G CK with = ji and every A G T>(X) we have 

ol{A X£ (s, cx)) < s~'^^{A) < s~‘^M and Q:((X \ K^^) x^ (0, x)) = 0, 
we conclude 

CX) 

oc{(L\Am) < M+ Y,^{{X\Kr,+m-i) X£(2-"+\x)) < 

n=l 

oo oo 

M 4-^ + ^ qn-lgl-n-m < 4-m ^ 4"”) < 4-™ (1 + M) ) , 

n=l n=l 

for every o: G CK. Since all Am are compact, we obtain the desired equal tightness. □ 


7.3 The Hellinger-Kantorovich problem 

In this section we will always consider N = 2, keeping the shorter notation Y = and 
C = As for 05.271) . for every pi,/i2 ^ M2(X) we dehne the sets 


^I(hi,h2) := jo e M2(X) : fi^a < pij and 
^=(/ii,P2) := jo e M2(C) : l)-a = /ii|. 


(7.20) 
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They are the images of !K<(/ii, /i 2 ) and TC^(/ii, /i 2 ) through the projections pjj; in particular 
they always contain plans pjjO:, where o: is given by fl5.28p . The condition ck G ^ 2 ) 

is equivalent to ask that 

J r‘^ip{xi)da.< j <pd/ij for every nonnegative G Bb(X). (7-21) 

We can thus dehne the following minimum problem: 

Problem 7.4 (The Hellinger-Kantorovich problem). Given ^ 1,^2 ^ M(X) find an opti¬ 
mal plan CKopt £ 3^=(hi,h2) C M 2 (C) solving the minimum problem 

l-K(yUi,/i 2 )^ :=min| J d£(t)i, 1 ) 2 ) do: ; ck G M 2 ((t), ()-Q: = yni|. (7.22) 


We denote by 0pt^4^(pi,/i 2 ) C M(C) the collection of all the optimal plans a. realizing 
the minimum in fl7.22p and by l-K^(/ii,/i 2 ) the value of the minimum in 07.221) (whose 
existence is guaranteed by the next Theorem \7.6\ ). 


Remark 7.5 (Lifting of plans in Y). Since any plan ck G M(C) can be lifted to a plan 
a = YjjQ: G “?{¥ x Y) such that pjO: = a the previous problem 17.41 is also equivalent to 
hnd 

min I [ dl{yi,y 2 ) da : a e M{Y X Y), h^a =/ij). (7.23) 


The advantage to work in the quotient space C is to gain compactness, as the next Theorem 
17.61 will show. □ 


An importance feature of the cone distance and the homogeneous is an invariance 
under rescaling, which can be done by the dilations from 07.161) . Let us set 

(t[R] := {[x,r] e : r < R} and (j:[R] := (E[R] x (E[R]. (7.24) 

It is not restrictive to solve the previous problem 17.41 by also assuming that ck is a prob¬ 
ability plan in T(C) concentrated on C[i7] with 

l-K^(/ii,/i 2 ) = niiny d^do:, (7 ;= ja G T(C) : 1 )^ 0 : =/i,, q:(C\C[R]) = o|. (7.25) 

In fact the functional d| and the constraints have a natural scaling invariance induced by 
the dilation maps dehned by 07.161) . Since 




d(dil,j,2(a)) = j 'd‘^dl{[xi,ri/'d]; [x2,r2/'d]) 


da = 


d|da. 


(7.26) 


restricting hrst a to C\ {( 0 , 0 )} and then choosing ■d as in 05.26al) with p = 2 we obtain a 
probability plan dil,j^ 2 (Q:LC\ {( 0 , 0 )}) in lK^(/ii,yU. 2 ) concentrated in (t[R\ \ {( 0 , 0 )} with 
the same cost J d|da. In order to show that Problem 17.41 has a solution we can then 
use the formulation 07.25P and prove that the set C where the minimum will be found 
is narrowly compact in T(C). Notice that the analogous property would not be true in 
y{Y X Y) (unless X is compact) since measures concentrated in {X x {0}) x {X x {0}) 
would be out of control. Also the constraints f)^a = /Xj would not be preserved by narrow 
convergence, if one allows for arbitrary plans in CP(C) as in fl7.22p . 
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Theorem 7.6 (Existence of optimal plans for the HK problem). For every fii, ^2 ^ M(X) 
the Hellinger-Kantorovich problem \ 7.4\ always admits a solution cx G fP(C) concentrated 
on C[-R] \ {( 0 , 0 )} with = YhiKi.^)- 

Proof. By the rescaling fl7.26p it is not restrictive to look for minimizers a of (17.251) . Since 
C[i?] is closed in C and the maps rf are continuous and bounded in C[i?], C is clearly 
narrowly closed. By Lemma [7.31 C is also equally tight in IP(Cl), thus narrowly compact 
by Theorem 12.21 Since the is lower semicontinuous in (t, the existence of a minimizer 
of (17.251) then follows by the direct method of the calculus of variations. □ 

We can also prove an interesting characterization of hK in terms of the L^-Kantorovich- 
Wasserstein distance on CP 2 ((i) given by (17.131) . An even deeper connection will be dis¬ 
cussed in the next section, see Corollary 17.131 

Corollary 7.7 (hK and the Wasserstein distance on ‘J’ 2 {€)). For every /ii,/i 2 £ M(X) we 
have 

l-K(/ii,/i 2 ) = min |Wdj(ai,Q; 2 ) : a* e y 2 (C), = (7.27) 

and there exist optimal measures ai for (17.271) concentrated on £[i?] with 
In particular the map : ‘J’ 2 {€) —?■ M(X) is a contraction, i.e. 

|-K(f}^ai, 1)^02) < Wdg.(Q!i, 02 ) for every ai e lP 2 (<i)- (7.28) 

Proof. If ai G 1 P2(^) with = /ij then any Kantorovich-Wasserstein optimal plan a G 
X £) for (17.131) with marginals Oj clearly belongs to 11 ^ 2 ) and yields the bound 

|-K(/ii,/i 2 ) < Wd 5 .(ai,a 2 )- On the other hand, if ck G Opt^ 4 <;yUi/i 2 is an optimal solution 
for (17.221) and a* := Tr^ct G ‘J’2{€) are its marginals, we have l-K(/ii, yU,2) > Wdg.(ai,a 2 ), so 
that ai realize the minimum for (I7.27p . □ 

We conclude this section with two simple properties of the hK functional. We denote 
by Pq the null measure. 

Lemma 7.8 (Subadditivity of hK^). The functional hK^ satisfies 

hK^(/i,? 7 o) =/i(X), hK^(/ii,/i 2 ) </ii(^) +/n 2 (X) for every/a, ij,i e M{X), (7.29) 

and it is subadditive, i.e. for every Hi, Hi G M(X) we have 

+ hn h2 + 7 ^ 2 ) — h2) + hK^(/i'^,/i2). (7.30) 

Proof. The relations in (I7.29p are obvious. If a G S^={hi: and a' G 1 ^ 2 ) is 

easy to check that ck -|- ck' G ili(/ii -h Ha T2 + h 2 )- Since the cost functional is linear with 
respect to the plan, we get (I7.30p . □ 

Subsequently we will use “L” for the restriction of measures. 

Lemma 7.9 (A formulation with relaxed constraints). For every Hi^ T 2 £ M(X) we have 

hK^(/ii,h 2 )= min | /" d^(r)i, 1 ^ 2 ) da + V (/i^ - 1)^0:) (AT)} (7.31a) 

= Hi{X) + H 2 {X) - max (2 / ri r 2 cos(d^(xi, X 2 )) da). (7.31b) 

Moreover, 
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(i) equations fl7.31ap - fl7.31bD share the same class of optimal plans. 


(a) A plan ex E Sj^(p.i,/U 2 ) is optimal for fl7.31ap - fl7.31bD if and only if the plan exg ; = 
X is optimal as well. 

(Hi) If cx is optimal for fl7.31ap - fl7.31bD with := /ij — l)^(x, then ct := ex + cx' is an 
optimal plan in Opt^ 4 ^(/ii,/i 2 ) for all ex' E /i' 2 ). 

(iv) A plan ex E ioi(/ii,/X 2 ) belongs to Opt^ 4 <;(/ii,/i 2 ) if and only if exg ;= Q:L((io x Cp) is 
optimal for fl7.31ap - fl7.31bp . 

Proof. The formulas fl7.31aD and fl7.31bp are just two different ways to write the same 
functional, since for every ex E i3<(/ii,/i2) we have 


j dildex + Y^{p.i-i)‘(ex){X) = Y^fii{X)-2 j ri ra cos(d^(xi, X 2 )) da. 


(7.32) 


Thus, to prove (i) it is sufficient to show fl7.31aD . The inequality > is obvious, since 
i3<(/ii,yn2) D /U- 2 ) and for every ex E /U- 2 ) the term Y)i {Pi ~ h‘iOi){X) van¬ 

ishes. 

On the other hand, whenever ex E /i 2 ), setting /i" := f)^a E M(X), /i' := /ij—/i" 

and observing that ex E fi')) we get 

f dJ(Di, 1,2) da + (^L, - l,?a) (X) > IK^W, fij) + ti[(X) + 

i 

¥nM il73ot 

> H<^(/i;,/r') + fK 2 (/x",/i") > H<2(/ri,/i2). 

The same calculations also prove point (iii). 

In order to check (ii) it is sufficient to observe that the integrand in fl7.31bp vanishes 
on (£: \ (£0 X (tp). 

Finally, if a G Opt^ 4 <;(/ii, yn 2 ) is optimal for fl7.22D . then by the consideration above it 
is optimal for fl7.31bD and therefore (ii) shows that ap is optimal as well. The converse 
implication follows by (iii). □ 


7.4 Gluing lemma and triangle inequality 

In this section we will prove that hK satishes the triangle inequality and therefore is a 
distance on M(X). The main technical step is provided by the following useful property 
for plans in M(C®^) with given homogeneous marginals, which is a simple application of 
the rescaling invariance in fl7.26D . 

Lemma 7.10 (Normalization of lifts). Let ex E M 2 (C®^), N >2, be a plan satisfying 

\:)‘(ex = fii E Jd^X) for i = 1,..., N, and Oi = J d|(r)j_i, t)j) da/or i = 2, ..., iV, (7.33) 

and let j E {!,... ,N} be fixed. Then, it is possible to find a new plan ex E M2(C1®'^) 
which still satisfies fl7.33D and additionally the normalization of the jth lift, 

7r^(a) = hp-h ptt(/ij 0 (5i). (7.34) 
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Proof. By possibly adding to cx (which does not modify fl7.33p i we may suppose 

that 


Uj := G : Tr\t)) = o}) > 1, 

where j is hxed as in the lemma. In order to hnd cx it is sufficient to rescale cx by the 
function 


^(t)) := 



if tjj ^ 0, 

otherwise. 


(7.35) 


With the notation of fl7.16p we set cx := dil^^ 2 (Q:) and we decompose cx in the sum 
cx = cx' + cx" where ex' = Q:L{t) G : 7r^{t)) = o}. For every ( G B6((£) we have 


J C(‘li)da = y' Ci^j-'d-\t)))'d^{t))dcx = J Cio)u-^dcx'+J C,{[xj,rj/^{X))])^‘^{X))dcx!' 

= C(o) + j C(K-,l])r|dQ:" = C(o) + J C(K-,l])rj^dQ! = C(o) + j C o P d(/ij ® (5i) 

which yields fITTMll . □ 

We can now prove a general form of the so-called “gluing lemma” that is the natural 
extension of the well known result for transport problems (see e.g. [2], Lemma 5.3.4]). 
Here its formulation is strongly related to the rescaling invariance of optimal plans given 
by Lemma [7. 101 


Lemma 7.11 (Gluing lemma). Let us consider a finite collection of measures /ij G M(X) 
for i = 1,..., N with N >2. Set 


N 


N 


0 := + and := ^/i,(X). 


i=2 


i=l 


Then there exist plans cki, 0:2 G 1P2(C®'^) such that 
'^Icxk = Hi for z = l,...,iV and 
j d^(pj_i,pj) dttfc = l-K^(/ii_i,/ii) for i = 2,...,N. 

Moreover, the plans cxk satisfy the following additional conditions: 

CKi is concentrated on (t) G C®^ : ^ i'^(t)) < 

i 

CX 2 is concentrated on (t) G Gl®'^ : supri(t)) < 0} = (£[0])'^'^. 


(7.36) 


(7.37) 


(7.38) 

(7.39) 


Proof. We hrst construct a plan cx satisfying fl7.37l) . then suitable rescalings will provide 
cXk satisfying 07.381) or 07.391) . In order to clarify the argument, we consider Wcopies 
Xi, X 2 ,..., X]\f oi X (and for (£ in a similar way) so that X®^ = 

We argue by induction; the starting case iV = 2 is covered by Theorem 17.61 and 
Lemma 17.101 Let us now discuss the induction step, by assuming that the thesis holds 
for N and proving it for iV -|- 1. We can thus find an optimal plan cx^ such that 07.371) 
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hold, and another optimal plan a G Opty^{fiN, fij^+i) for the couple /iAr,/iAr+i. Applying 
the normalization Lemma 17. 101 to a.^ (with j = N) and to ck (with j = 1) we can assume 
that 

7rj^(Q:^) = + pj(/ijv ® 5i) = 7rJ(Q:). 

Therefore we can apply the standard gluing Lemma in ( El 
Lemma 5.3.2] and [H Lemma 2.2] in the case of arbitrary topological spaces) obtaining a 
new plan satisfying tt^ and = q/. in particular, 

satishes fl7.37p . 

A further application of the rescaling fl7.26p with as in fl5.26ap yields a plan cx.i 
satisfying also fl7.38p . 

In order to obtain 0 : 2 , we can assume Q:({|t)| = 0}) = 0 and set 0:2 = dil,j^ 2 (Q:), where 
we use the rescaling function 

t9(t)) := r"^|t)|oo = r”^supri(t)) with ;= / |t)|^dQ:. 

i J<r0N 

To obtain fl7.39p it remains to estimate r. We consider arbitrary coefficients 6i > 0 and 
use for n = 2,..., N the inequality 

71 ^1 /2 ^ 1 /2 

rn < L + 5 ]] lb - b_i| < (^ 61^1 + - q.ipj 

i=2 i=l i=2 

N ^<2 ^ 1/2 

i=l i=2 


which yields 


= 


[ |t)|Lda < (^0/) / (eirl + ^eidl{t)i,t)i-i))doL 

J^(8N V / J^(SN V / 

N N 


Z=1 


i=2 


optimizing with respect to 6'j > 0 we obtain the value of 0 given by 07.361) . □ 

The next remark gives a similar rescaling result for probability couplings f3 G T 2 ('^‘^^)- 

Remark 7.12. In a completely similar way (see [21 Lemma 5.3.4]), for N > 2, a. hnite 
collection of measures /i* G M(X), and coefficients 6*j > 0, i = 1,..., A^, there exists a 
plan f3 G CP 2 (^®'^) concentrated on {t) G : supj rj(t)) < S} with 


N 

^ := \/^ hK(/ii, fii), 

i=2 


(7.40) 


such that 


1)^/3 = fii and 


d^(r)i, t)i) d/3 = hK (/ii, fii) for i = 1,..., N. 


□ 


(7.41) 

















Arguing as in the proof of Corollary 17.71 one immediately obtains the following result, 
which will be needed for the proof of Theorem 18.81 and for the subsequent corollary. 

Corollary 7.13. For every finite collection of measures fii G M(X), i = 1, ..., N , there 
exist Oj, fii G 'ip 2 {€) with ai concentrated in £[r] where r = min(M, 0) is given as in fl7.36p 
and fii concentrated in (£[2] given by fl7.40p such that 

\fiai = Hi and ij'^fii = Hi for i = 1,..., N, 

|-K(/ii,/ii) = Wdj(/3i, A) and Hi{Hi, for i = 2, ..., N. 

We are now in the position to show that the functional hK is a true distance on M(X), 
where we deduce the triangle inequality from that for Wd^. by using normalized lifts. 

Corollary 7.14 (hK is a distance). hK is a distance on M(X); in particular, for every 
£ M(X) we have the triangle ineguality 

< H^(hi)/^2) + bK(h25h3)- (7.42) 

Proof. It is immediate to check that hK is symmetric and hK(yni,/i 2 ) = 0 if and only if 
Hi = H 2 - In order to check fl7.42p it is sufficient to apply the previous corollary 17.131 to 
hnd measures Oj G T 2 (<i), i = 1,2,3, such that = Hi and hK(/ii,/i 2 ) = Wdg.(ai,a 2 ) 
and hK(yn 2 ,/i 3 ) = Wdg.(a 2 , Os)- Applying the triangle ineqnality for Wd^. we obtain 

/^s) < Wdg.(ai, Os) < Wdg.(Q!i, 02) + Wdg.(Q!2, oifi) = VK{hi, H2) + bK(/i2, H'i)- ^ 

As a consequence of the previous two results, the map Irfi : T 2 (<^) M(X) is a metric 
submersion. 

7.5 Metric and topological properties 

In this section we will assume that the topology r on X is induced by d and that (X, d) is 
separable, so that also (C, de;) is separable. Notice that in this case there is no difference 
between weak and narrow topology in M(X). Moreover, since X is separable, M(X) 
equipped with the weak topology is metrizable, so that converging sequences are sufficient 
to characterize the weak-narrow topology. 

It turns out [21 Chap. 7] that (1P2('^), Wd^) is a separable metric space: convergence of a 
sequence (Q;n)neN to a limit measnre a in (CP 2 (<i), Wd^) corresponds to weak-narrow conver¬ 
gence in CP(C) and convergence of the quadratic moments, or, equivalently, to convergence 
of integrals of continuous functions with quadratic growth, i.e. 



for some constants A, i? > 0 depending on ip. Recall that r^(l)) = d^(l), 0 ). 

Theorem 7.15 (hK metrizes the weak topology on M(X)). hK induces the weak-narrow 
topology on M(X).- a seguence {Hn)neN ^ M(X) converges to a measure h in (M, hK) if and 
only if (/in)neN converges weakly to h in duality with continuous and bounded functions. 
In particular, the metric space (M(X), hK) is separable. 


69 













Proof. Let us first suppose that lim^^oo p) = 0. We argue by contradiction and we 

assume that there exists a function G Cb{X) and a subsequence (still denoted by /!„) 
such that 


inf 



> 0 . 


(7.44) 


The first estimate of 07.291) and the triangle inequality show that 


limsupyn„(X) < limsup (l-K(/i„,yn) + l-K(p,,? 7 o))^ = 

n—^oo n—^oo 


so that sup„/i„(X) = < cx). By Corollary 17.71 we can find measures G CP 2 (^) 

concentrated on Cf[2M] such that 

1) CZn Pi 1) OCfi pni ^^dg(^Cy.ni ^n) ^~^(/^) Pn}- 


By Lemma 17751 the sequence (an)neN is equally tight in CP 2 (<^); since it is also uniformly 
bounded there exists a subsequence k ^ rik such that weakly converges to a limit 
a G T 2 (^)- Since is concentrated on (£[2M] we also have limfc^oo VVdj.(anj., a) = 0 and 
therefore f)^a = p, linifc^oo Wj^ (a^^^, a) = 0. 

We thus have 


C(a:)d/i„, = lim / C(x)rMa(^ = / C(x)rMa = / C{x) dp 

Jt Jx 

which contradicts 07.44p . 

In order to prove the converse implication, let us suppose that pn is converging 
weakly to p in M(X). If p is the null measure r^o = 0, then lim„^oohn(^) = 0 so 
that lim^^oo \di{pni p) = 0 by 07.29p . 

So we can suppose that m := p{X) > 0 and have := Pn{X) > m/2 > 0 for 
sufficiently large n. We now consider the measures am a G 7{(t) given by 

On := Ptt("^nVn (8 and a := 

Since l)^an = Pn and f)^a = /r, by 07.281) we have VKi^Hm p) < \Nti^{ama). Since m~^fin is 
weakly converging to m~^fi in ‘J’{X) and —?■ m, it is easy to check that m~^pn <8) 

weakly converges to m~^p®5^ in TiY) and therefore weakly converges to a in fP((£:) 
by the continuity of the projection p. Hence, in order to conclude that Wd^(Q!n, a) —)■ 0 it 
is now sufficient to prove the convergence of their quadratic moments with respect to the 
vertex o. However, this is is immediate because of 


lim / 

Jx 


lim 

n—^oo 


d^(p, o) dan = lim 

n—^oo 


dan = lim = m = 

n—^oo 


d|(p, o) da. 


□ 


Corollary 7.16 (Compactness). If {X,d) is a compact metric space then (M(W), hK) is 
a proper metric space, i.e. every bounded set is relatively compact. 

Proof. It is sufficient to notice that a set 6 C M(W) is bounded w.r.t. hK if and only 
if sup^gg/i(X) < cx. Then the classical weak sequential compactness of closed bounded 
sets in M(X) gives the result. □ 
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The following completeness result for (M(X),I-K) is obtained by suitable liftings of 
measures /i* to probability measures G T 2 (^), supported in some (£[0]. Then the 
completeness of the Wasserstein space (T 2 (<i), Wd^.) is exploited. 

Theorem 7.17 (Completeness of (M(X), hK)). If {X, d) is complete than the metric space 
(M(X),1-K) is complete. 

Proof. We have to prove that every Cauchy sequence (/in)neN in (M(X),I-K) admits a 
convergent subsequence. By exploiting the Cauchy property, we can hnd an increasing 
sequence of integers k i-G- n{k) such that l-K(/im,/im') < 2“^ whenever m,m' > n{k) and 
we consider the subsequence fi[ := p.n{i), 
so that 


N 

E Pn(i—l)^ — 's/Pi (^) T 1, 

i=2 

and by applying the Gluing Lemma 17.111 for every iV > 0 we can hnd measures af G 
T 2 (<^), f = 1,... , iV, concentrated on (f:[0] with 0 ;= pii{X) + 1, such that 

= /i' and = l-K(/i',p'_i). 

For every i the sequence N i-G> af G T 2 (<i) is tight by Lemma 17731 and concentrated on the 
bounded set C[0], so that by Prokhorov Theorem it is relatively compact in (T 2 (<i), Wd^.). 

By a standard diagonal argument, we can hnd a further increasing subsequence m h-)■ 
N{m) and limit measures a, G T 2 ('^) such that lirnm^-oo «*) = 0. The conver¬ 

gence with respect to Wd^. yields that 

l)^Q!i = fii, Wd^(ai, ai_i) = l-K(/i', 

It follows that i i-G- is a Cauchy sequence in (CP 2 (<i), Wd^.) which is a complete metric 
space [21, Prop. 7.1.5] and therefore there exists a G CP 2 ('C) such that limj^oo «) = 0. 

Setting /i := t)^a G M(X) we thus obtain limj^oo bK(/i',p) = 0. □ 

We conclude this section by proving a simple comparison estimate for hK with the 
Bounded Lipschitz metric (cf. [IZl Sec. 11.3]), see also [211 Thm. 3]. The Bounded Lips- 
chitz metric is dehned via 

BL(/ii,/i2) :=sup|y'cd(/ii-/i2) :CeLiPfe(^), sup ICI + Lip(C, X) < l}- (7.45) 

We do not claim that the constant C* below is optimal. 

Proposition 7.18. For every /ii,p 2 ^ M(W) we have 

1 y/2 

fi 2 ) < C^:(^'^p.i{X)j |-K(/ii,/i 2 ), where C* := ^/2 + 'k'^/2. (7.46) 
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Proof. Let f G Lip^(X) with supjf |.^| +Lip(,^, X) < 1 and let a G tP(C) optimal for fl7.25p 
and concentrated on (t[R\ with ;= pi(Xi) + p 2 (-^ 2 )- Notice that 

l^(xi) - ^(x2)| < max(d(xi,X2),2) < 2d2(xi,a;2) < 2d^(xi,X2) < 27rsin(d7r(xi,X2)/2) 

We consider the function ; C —)■ M dehned by C(h) := Hence, ( satishes 


((Oi) - C(h2) < l^(xi) -^(X2)|rir2 + (|^(xi)|ri + |^(x2)|r2)|ri - r2| 

< 27rsin(d^(xi,X2)/2)rir2 + (ri + r2)|ri - 12 ] 

E3 - /- 

< V (l + r2)2 + 7r2rir2 de;(t)i, 92 ) < rf + r| dc(i)i, 1 ) 2 ) 
Since the optimal plan cx. is concentrated on {r^ + we obtain 


^d(/ii -/i2) 


'X 


I C(hi)-C(t)2)dQ: < j IC(t)i)-C(t)2)|dQ: 

< /'de;(t)i, t) 2 ) do; < C*i?l-K(/ii,/i 2 )- □ 


7.6 Hellinger-Kantorovich distance and Entropy-Transport func¬ 
tionals 

In this section we will establish our main result connecting hK with LET. 

It is clear that the dehnition of EK does not change if we replace the distance d on X 
by its truncation d^r = d A vr. It is less obvious that we can even replace the threshold vr 
with 7r/2 and use the distance d 7 r/ 2 ,(!: of Remark 17.21 in the formulation of the Hellinger- 
Kantorovich Problem 17.41 This property is related to the particular structure of the 
homogeneous marginals (which are not affected by masses concentrated in the vertex 0 
of the cone C); in [271 Sect. 3.2] it is is called the presence of a sufficiently large reservoir, 
which shows that transport over distances larger than 7r/2 is never optimal, since it is 
cheaper to transport into or out of the reservoir in 0 ). This will provide an essential piece 
of information to connect the LK and the LET functionals. 

In order to prove that transport only occurs of distances < 7r/2 we dehne the subset 

:= {d,r/2,(>: < de;} = {(hi,»l2) G Co x Cfo : d(xi,X2) > 7r/2} (7.47) 

and consider the partition {€', C") of C = C x C, where €" := (t \ (t = {din/ 2 X = dc}. 
Observe that 

< := C"n (Co X Co) = {(0i,t)2) e Co X Co ; d(xi,X2) < 7r/2}. (7.48) 

In the following lemma we show that minimizers a G Opt^^<;(yUi,/i 2 ) are concentrated on 
(t', i.e. Q:(C') = 0 which holds if and only if etc = Q:L(Co x Co) is concentrated on C". To 
handle the mass that is transported into or out of 0 , we use the continuous projections 

0i:C^C, 01(01,02) := (t)i,o), 02(01,02) := (0,02). (7.49) 
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Lemma 7.19 (Plan restriction). For every a G M(C) the plan 


a ;= a" + ( 0 i)jjQ:' + ( 02 )ttQ:^ with ex' ;= ckLC', ex" := ckLC", (7.50) 

is concentrated on (t", has the same homogeneous marginals as cx, i.e. \:)^cx = i)^(x, and 


dldcx= / d^/2,e;dQ:< / d^do:, 
Jc Je Je 


(7.51) 


where the ineguality is strict if (x{(t) > 0. In particular for every /ii,yU 2 G M(X) 


|-K^(/ii, 112 ) = min | J d^/ 2 ,e:( 0 n 92 ) da ; a G M 2 (C), ()■ a = /ijj. (7.52) 

Proof. For every ( G Bfe(X), since ri o 92 = 0 and ri o = ri, we have 

J Cd(f)?a) = y' C(xi)rida = y' C(xi)i'? da" + ^ J C(xi( 0 fc))ri( 0 fc)^ da' 

= [ C(xi)rida"+ f C(xi)r2da'= [ C(xi)r2da= f Cd(f)ia), 


so that l)^a = Ijfex; a similar calcnlation holds for so that a G il=(/ii, 1 ^ 2 ). Moreover, 
if (r)i, 02 ) £ we easily get 

d£(tli, 02) > rl + rl = d^( 0 i(l)i, tjs)) + d^( 02 (tli, 02)) 


so that whenever a(C') > 0 we get 


J dldcx = y (d| o 01 + d^ o 02 ) da'+ J d^dex" < J d^dex' + J d^dex" = J d^dex, 


which proves fl7.5ip and characterizes the eqnality case. fl7.52p then follows by fl7.5ip and 
the fact that the homogeneons marginals of a and a coincide. □ 

In fl7.52p we have established that a G Optii<;(pi, /i 2 ) has snpport in C". This allows 
ns to prove the identity LET = TK^. For this, we introdnee the open set & C via 


0 ;= |([xi,ri], [X2,r2]) G C : rir2 7 ^ 0, d(xi,X 2 ) < 7r/2| 


and note that rir 2 cos(d^/ 2 (xi, X 2 )) > 0 in 0. Recall also = P<8ip : —)■ 0, where p is 

dehned in fl7.7p . 

Theorem 7.20 (EK^ = LET). For all pi, fi 2 £ M(X) we have 

H<2(/ii,P2) = L£r(pi,p 2 ), (7.53) 

and a(0') = 0 for optimal solution ex G M(0) of Problem \7.4\ or of fl7.31al bi. Moreover, 

(i) ex G M(0) is an optimal plan for fl7.31al b) if and only if ex = 0 andy^{ex\_(toXCo) 
is an optimal plan for fl6.30p - fl6.29p . 
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(a) OL E is any optimal plan for fl6.3ip if and only if a := pjjO: is an optimal plan 

for the Hellinger-Kantorovich Problem \7.4\ 

(Hi) If'y G M(X X X) belongs to Opti£i-(yU,i, yU. 2 ) and gt : X ^ [ 0 , cxd) are Borel maps so 
that Hi = Qi'yi + nt, then f3 := (p o (xi, 0 : 2 , 9^“^{X 2 )))is an optimal plan 

for fl7.31ap - (17.31bp . and it satisfies rir2 cos(d7r/2(xi, X2)) = 1 (3-a.e.; in particular (3 
is concentrated on &. 


(iv) If OL E M(l^) is an optimal plan for Problem then a. := ckL© is an optimal 
plan for fl7.31aL b) . Moreover, 

-I j c\ 

• the plan (3 := dilij, 2 (tt), with H := (rir 2 cos(d^/ 2 (xi, X 2 ))) , is an optimal plan 

satisfying rir 2 cos(d^/ 2 (xi, X 2 )) = 1 (3-a.e. 

• If {X,t) is separable and metrizable, 7 := (xi,X 2 )jj /3 belongs to Opt^£j-(/ii,/i 2 ), 

• //(X, r) is separable and metrizable, (3 = [p o [xi, Qy‘^{x 2 )))^'^. 

Proof. Identity 07.531) and the first statement immediately follow by combining the pre¬ 
vious Lemma 17.191 with Remark 17.51 and 06.3ip . 

If CK is an optimal plan for the formulation 07.31al bl we can apply Lemma I7.9f iiil to 
hnd CK > CK optimal for 07.22p . so that Q:(C^) < = 0. 

Since all the optimal plans for hK do not charge (t, combining Lemma 17.91 Remark 
17.51 and Theorems 16.31 and 16.71 statements (i), (ii), and (hi) follow easily. 

Concerning (iv), the optimality of is obvious from the formulation (I7.31bp and the 
optimality of (3 = diR^ 2 (^) follows from the invariance of fl7.31bD with respect to dilations. 
We notice that /3-almost everywhere in 0 we have 

5 ^f^o(r-) + c(xi,X 2 ) = ^ - 1 - logr^ - log(cos 2 (d^/ 2 (xi, X 2 ))) 

i i 

= 5]] r- - 2 - 2 log(rir2 cos(d^/2(xi, Xa))) 

i 

= rl-^rl- 2 rir 2 cos(d^/ 2 (xi, xa)), 

so that by fl7.31ap we arrive at 

/ (5^£/o(r?) + c(xi,X2))d/3 + 5^P.(X)-f,J/3(X)) (7,54) 

i i 

Let us now set 7 := {xi,X2)^(3 E M(X x X) and (Ii := 7 rJ /3 E M((£:), which yield % : = 
7 rj 7 = (xi)jj/3 = x^Pi E M(X) and /!* := f)-/3 = {x.i)i{rfj) = Xjj(r2/3j). Denoting by 
{(3i,xi)xiex the disintegration of (3i with respect to 7 * (here we need the metrizability and 
separability of (X, r), see [21 Section 5.3]), we hnd 

J Cd/ij = y C(x)rMA = J {^j C(x)rMA,„j dy* = j C{x)(^J r'^ dy^ 

for all ( E Bb(X), so that 


Hi = Qili < Hi with gi{x) '■= J d/3i_, 
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Applying Jensen inequality we obtain 


J ^o(r-) = j f^o(r-) d(3i = j j Uo{r‘l) dy* 

> j Uo(^ j r^d(3i^^.{ri)^d'yi = j Uo{gi{x)) d'ji. 

Now J c(xi,X2)d/3 = J c(a;i,a; 2 )d 7 and fl7.54p imply 

|-K^(pi,yU2) > V] [ Uo{Qi)d'ji+ [ cd7 + y^^Ui{X) 

■ Jx JxxX ■ 


with Ui := Hi — Hi G M(X). Hence, Hi = Qili + ^he standard decomposition 

(cf. (I23D) imply we get i>i = Hi+{Qi-Qi)li > Hence, Uo{s) = s-l-logs 
and the monotonicity of the logarithm yield 

H<^(/ii,h2) > ^ 17o(pi)d7i + z/i(X)j + y'cd7 

= (y (Uo{gi) + Qi- ft) d7i + hH^)) + y cd7 

> (y UoiQi)d-fi + hH^)) + J cdj > LEr(pi,p2), 

where the last estimate follows from Theorem 16.2r bb Above, the hrst inequality is strict 
if ft 7 ^ nt so that ft > ft on some set with positive 7 j-measure. 

By the hrst statement of the Theorem it follows that 7 G Opt^£j-(/ii,yU 2 ). Hence, 
all the inequalities are in fact identities, and we conclude ft = ft. Since Uq is strictly 
convex, the disintegration measure is a Dirac measure concentrated on -^/ft^a^, so 
that/3 = (^3 o (xi, Pi^^(a;i);a; 2 , P 2 ^^(x 2 )))j, 7 . □ 

We observe that the system ( 7 , Pi,ft) provided by the previous Theorem enjoys a 
few remarkable properties, that are not obvious from the original Hellinger-Kantorovich 
formulation. 


a) First of all, the annihilated part /i^ of the measures Hi is concentrated on the set 


Mij := {xiE X : d{xi,supp{Hj)) > 7r/2} 


When Hi{Mij) = 0 then Hi ^ 1i- 

b) As a second property, an optimal plan 7 G Opt^£j-(/ii, /X 2 ) provides an optimal plan 
CK = (p o (xi, {xi)] X 2 i (^ 2 “^ {X 2 ))) which is concentrated on the graph of the map 

^ 2 ^^( 3 : 2 )) from X x X to M+ x R+, where the maps ft are independent, in 
the sense that ft only depends on x*. 

c) A third important application of Theorem I7.2UI is the duality formula for the hK func¬ 
tional which directly follows from 06.14p of Theorem 16.31 We will state it in a slightly 
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different form in the next theorem, whose interpretation will be clearer in the light of 
Section 18.41 It is based on the inf-convolntion formnla 




. J sin2(d^/2(x,a;'))\ 

x'6xVl+2e(xO 2(1+ 2^(0;')) ) 


inf Vl cos 2 (<^^/ 2 ( 2 ^) 2 ;')) 
x'ex 2 V 1 + 2i{x') 


(7.55) 


where ^ G B(X) with ^ > -1/2. 


Theorem 7.21 (Dnality formnla for hK). 

(i) If i G Bfe(X) with infx ^ > —1/2 then the function defined by fl7.55p belongs to 
Lipj,(X), satisfies snpj^- < 1/2, and admits the equivalent representation 


^if{x) 


inf -fl- 

2 V l + 2^(a:') / 


(7.56) 


In particular, if ^ has bounded support then G Lip^^(X), the space of Lipschitz 
functions with bounded support. 


(a) Let us suppose that (-^, d) is a separable metric space and r is induced by d. For 
every G M(X) we have 

^|-K^(/ro,/^i) = snp I j j ^ d/io : ^ G Lipfe,(X), inf^>-l/2|. (7.57) 


Proof. Let ns hrst observe that if 

- <a<7<6inX 

2 - - 


a 

1 + 2a 


< < 


b 

1 + 26 


in X, 


(7.58) 


where the npper bonnd follows nsing x' = x, while the lower bonnd is easily seen from 
the hrst form of in fl7.55p and sin^ > 0. Since 1/(1 + 2^(x')) < 1/(1 + 2a) for 
every x' G X, the fnnction is also Lipschitz, because it is the inhmum of a family of 
uniformly Lipschitz functions. 

Moreover, for d(x,x') > 7r/2 we have the estimate 


1 

2 


cos^(d,,/2(x,x0) \ _ 1 b 
1 + 2^(x') y 2 ^ 1 + 26 


if d(x, x') > tt/2, 


(7.59) 


which immediately gives fl7.56p . In particular, we have 


^ = 0 in X\B => ^ 1 ^ = 0 in {x G X ; d(x,5) > 7r/2}. (7.60) 

Let us now prove statement (ii). We denote by E the the right-hand side of 07.571) 
and by E' the analogous expression where ^ runs in Cb{X): 

j ed/io:eeC,(X), 

It is clear that E' > E. If ^ G Cf,(X) with inf^ > —1/2, setting '^i(xi) := —2^(xi), 
'4>2{x2) := 2(i^i^)(x2), we know that supjj^-02 < 1 and 'ip 2 £ Lipf,(X). Thus, and -02 
are continuous and satisfy 


inf? >- 1 / 2 }. (7.61) 



(1 -'(62(3:2)) (1 --(61(3:1)) > COS^(d,,/2(Xi,X2)). 
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Hence, the couple (' 0 i,' 02 ) is admissible for fl6.14p (with Cf,(X) instead of LSCs(X); note 
that r is metrizable and thus completely regular), so that H<^(/io,hi) = LEh(/io, yUi) > E'. 

On the other hand, if ('0i,'^2) G Cb{X) x Cb{X) with supj^-0* < 1) setting = —IV'i 
and ^2 ■= we see that 2^2 > V '2 giving E' > l-K^(/io, pi), and E = E' follows. 

To show that E = E' in the general case, we approximate G C;,(X) with infx > —1 
by a decreasing sequence of Lipschitz and bounded functions (e.g. by taking 'ipn{x) : = 
snpy — nd 7 r(x, I/)) and use that the supremum in 07.611) does not change if we restrict 
it to Lipf,(X). 

Let now ^ be Lipschitz and valued in [a,b] with —l/2<a<0<6. Taking the 
increasing sequence of nonnegative cut-off functions Cn{x) := 0 V (n — d(a;, x)) A 1 which 
are uniformly 1-Lipschitz, have bounded support and satisfy t 1 ^-s n —)■ cx), it is easy 
to check that ■= Cn^ belong to Lip^^(X) and take values in the interval [a,b] so that 

^ < T^b ^ 

Since ^n{.x) = 0 if d{x,x) > n and ^ri{x) = ^{x) if d{x,x) < n — 1, by 07.56P we get 

if x > n-I- 7 r/ 2 , if x < n — 1 — 7 r/ 2 . (7.62) 

Thus G Lipjj,(X) and the Lebesgue Dominated Convergence theorem shows that 

lim / d/ii - / d/io = / d/ii - / ^ d/io- □ 

Jx Jx Jx Jx 


7.7 Limiting cases: recovering the Hellinger—Kakutani distance 
and the Kantorovich—Wasserstein distance 

In this section we will show that we can recover the Hellinger-Kakutani and the Kantorovich- 
Wasserstein distance by suitably rescaling the hK functional. 


The Hellinger-Kakutani distance. As we have seen in Example E.5 of Section 13.31 
the Hellinger-Kakutani distance between two measures ^ 1,^2 £ M(X) can be obtained 
as a limiting case when the space X is endowed with the discrete distance 


dHeii(a:i,X2) := 


a if xi 7^ X 2 


0 if xi = X 2 , 
The induced cone distance in this case is 


with a G [tt, -|-oo]. 


d|([xi,ri], [X2,r2]) = 


(ri - r 2 )^ if Xi = X2, 
l + r2 if a;i 7 ^ X 2 . 


(7.63) 


(7.64) 


and the induced cost function for the Entropy-Transport formalism is given by 


CHell(a^l, X2) 


0 if Xi = X 2 , 
-|-oo otherwise. 


(7.65) 


Recalling fl3.23p - fl3.24p we obtain 

HelP(/ii,/i 2 ) = LErHeii(/ii,/i 2 ) = / dy with /x* = fty < 7 ^ M(X). (7.66) 

Jx 
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Since CHeii > c = £{d) for every distance fnnction on X, we always have the npper bonnd 
H<(/ii,/i 2 ) < Hell(pi,/i 2 ) for every /ii,/i 2 e M(X). (7.67) 

Applying Lemma we easily get 

Theorem 7.22 (Convergence of hK to Hell). Let (X, r, d) be an extended metric topolog¬ 
ical space and let hK^d be the Hellinger-Kantorovich distances in M(X) induced by the 
distances dA := Ad, A > 0. For every couple /ii, ii 2 £ M(X) we have 

H<Ad(/ii,h 2 ) t Hell(yUi,/i 2 ) as A t CO. (7.68) 

The Kantorovich—Wasserstein distance. Let us hrst observe that whenever /ii, /i 2 G 
M(X) have the same mass their hK-distance is always bounded form above by the 
Kantorovich-Wasserstein distance Wd (the upper bound is trivial when /ii(X) 7 ^ yU, 2 (X), 
since in this case Wd(/ii,yU. 2 ) = + 00 ). 

Proposition 7.23. For every couple /U.i,/i 2 G M(X) we have 

H<(hi,/^ 2 ) < Wd^/ 2 (fo)/^ 2 ) < Wd(/ii,/r 2 )- (7.69) 

Proof. It is not restrictive to assume that /i 2 ) = / d ^/27 < ^ optimal 

plan 7 with marginals /ij. We then dehne the plan ex. := S(j 7 G M(Cx C) where s(a;i, a; 2 ) : = 
([xi, 1], [a; 2 ,1]), so that [)jct = fit. By using fl7.52p and fl7.3p we obtain 

H<^(/ii,/i 2 ) < 4^ sin 2 (d,,/ 2 (xi,X 2 )/ 2 ) do: < j X 2 ) dj 1 x 2 )- □ 

In order to recover the Kantorovich-Wasserstein distance we perform a simultaneous 
scaling, by taking the limit of nhKd/n where H<d/n is induced by the distance d/n. 

Theorem 7.24 (Convergence of hK to W). Let (X, r, d) be an extended metric topolog¬ 
ical space and let hKd/A be the Hellinger-Kantorovich distances in M(X) induced by the 
distances A“^d for A > 0. Then, for all /ii,/i 2 G M(X) we have 

AhKd/A(/ii,/X 2 ) t Wd(/ii,/i 2 ) as A too. (7.70) 

Proof. Let us denote by LET a = hKj/;^ the optimal value of the LET-problem associated to 
the distance d/A. Since the Kantorovich-Wasserstein distance is invariant by the rescaling 
AWd/A = Wd, estimate fl7.69p shows that A hKd/A < Wd. 

Since x i-G- sin(a: A 7 r/ 2 ) is concave in [0,cx)), the function x i-A sin(x A 'k/2)/x is 
decreasing in [0, 00 ), so that asin(((i/a) A 7 r/ 2 ) < Xsm{{d/\) A 7 r/ 2 ) for every d > 0 and 
0 < a < A. Combining 07.521) with 07.1ObD we see that the map A i-A AhKd/A(/ai,/i 2 ) is 
nondecreasing. 

It remains to prove that L := liniA^oo AhKd/A(/ii,/a 2 ) = supA>i AhKd/A(/ai,/i 2 ) > 
Wd(/ai,/i 2 ). For this, it is not restrictive to assume that L is hnite. 

Let 7 a be an optimal plan for hKd/A(/ii, /X 2 ) with marginals 7 a, i = 7rj7A. We denote by 
^ the entropy functionals associated to logarithmic entropy F{s) = Ui{s) and by ^ the 
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entropy functionals associated to F{s) := Ii(s) as in Example E.3 of Section Since 
the transport part of the LET-functional is associated to the costs 

116.611 

Cxixi,X2) = X‘^iidi{xi,X2)/X) > d‘^{xi,X2), 

we obtain the estimate 

> A^LErA(/ii, 112 ) > V] A^^( 7 A,i|/ii) + [ d^(xi, X 2 ) d7A. (7.71) 

. Jx 

Proposition 12.101 shows that the family of plans (7a)a>i is relatively compact with respect 
to narrow convergence in M(X x X). Since A^F(s) Ii(s), passing to the limit along a 
suitable subnet (A(a))agA parametrized by a directed set A, and applying Corollary 12.91 
we get a limit plan 7 G M(X x X) with marginals 7 ^ such that 

< L^, which implies 7 * = /ij. 

i 

In particular, we conclude that /ii(X) = 'y{X x X) = /i 2 (X). Since d is lower semicon- 
tinuous, narrow convergence of 7 a(q,) and fl7.7ip also yield 


L > liminf 

aSA 


d2(a;i,X2)d7A(„) > / d2(xi, X 2 ) d7 > W^(/ii, /i2). 


□ 


7.8 The Gaussian Hellinger-Kantorovich distance 

We conclude this general introduction to the Hellinger-Kantorovich distance by discussing 
another interesting example. 

We consider the inverse function g : M_|_ [0, tt/2) of y/i: 

g{z) := arccos(e“^^'^^), satisfying 5f(0) = 0, = 1, i{g{d)) = cP. (7.72) 

Since \/£ is a convex function, is a concave increasing function in [0, cxo) with g{z) < z 
and Imiz^oo g{z) = 7r/2. 

It follows that g := o d is a distance in X, inducing the same topology as d. We can 
now introduce a distance hKg associated to g. The corresponding distance on £ is given 
by 

ge:(»li, 02 ) := r? + - 2rir2 exp(-d^(xi, X2)/2). (7.73) 

From g{z) < z we have gc < d^. 

Theorem 7.25 (The Gaussian Hellinger-Kantorovich distance). The functional 


a-K^(/ii,/i2) := H<g(/ii,/i2) = min|y'gc(t)i,t)2)dQ: : a G M(C), f)-Q: =/iij 


(7.74) 


defines a distance on M(X) dominated by l-K. //(X, d) is separable (resp. complete) then 
(M(X),G1-K) is a separable (resp. complete) metric space, whose topology coincides with 
the weak convergence. We also have 


GH<^(hnh 2 ) = min I ^^( 7 i|/ri)-F J d^(xi,X 2 )d 7 ; 7 G M(X)| 

= sup|^ / (1 - dfii : © <y52 < d^j. 


(7.75) 
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We shall see in the next Section 18.21 that hK is the length distance indnced by GhK if 
d is a length distance on X. 


8 Dynamic interpretation of the 
Hellinger-Kantorovich distance 

As in Section 17751 in all this chapter we will snppose that (X, d) is a complete and separable 
(possibly extended) metric space and r coincides with the topology indnced by d. All the 
results admits a natural generalization to the framework of extended metric-topological 
spaces [H Sec. 4]. 


8.1 Absolutely continuous curves and geodesics in the cone C 

Absolutely continuous curves and metric derivative. If (Z, d^) is a (possibly 
extended) metric space and I is an interval of M, a curve z : J —)■ Z is absolutely continuous 
if there exists m G L^(J) such that 

rh 

dz(z(to)) z(ti)) < / m(t) dt whenever to, h £ 7, ( 8 - 1 ) 

Jto 

Its metric derivative |z'|dz (we will omit the index dz when the choice of the metric is 
clear from the context) is the Borel function dehned by 


•= limsup 

/n-O 


d^(z(t -F h),z(t)) 
\h\ 


( 8 . 2 ) 


and it is possible to show (see 0 ) that the limsup above is in fact a limit for ..Z'^-a.e. points 
in I and it provides the minimal (up to possible modihcations in =Z'^-neghgible sets) 
function m for which fl8.ip holds. We will denote by AC^(/; Z) the class of all absolutely 
continuous curves z : / —)■ Z with |z'| G Lp( 7); when I is an open set of M, we will also 
consider the local space ACf„^(/; Z). If Z is complete and separable then AC^([0,1]; Z) is 
a Borel set in the space C([0,1]; Z) endowed with the topology of uniform convergence. 
(This property can be extended to the framework of extended metric-topological spaces, 
see [3].) 

A curve z : [0,1] —?■ Z is a (minimal, constant speed) geodesic if 


d^(z(to),z(ti)) = lb - to|dz(z(0),z(l)) for every b, b e [0,1]. (8.3) 


In particular z is Lipschitz and |z'| = dz{z(to),z(ti)) in [0,1]. We denote by Geo(Z) C 
C([0,1]; Z) the closed subset of all the geodesics. 

A metric space (Z, dz) is called a length (or intrinsic) space if the distance between 
arbitrary couples of points can be obtained as the inhmum of the length of the absolutely 
continuous curves connecting them. It is called a geodesic (or strictly intrinsic) space if 
every couple of points Zq, Zi at hnite distance can be joined by a geodesic. 
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Geodesics in (C. If {X, d) is a geodesic (resp. length) space, then also C is a geodesic 
(resp. length) space, cf. [9l Sec. 3.6]. The geodesic connecting a point t) = [x,r] with o is 

t)(t) = [x,tr] = t) -t for t e [0,1]. (8.4) 

If Xi,X 2 G X with d(a:i,X 2 ) > tt, then a geodesic between t)j = can be easily 

obtained by joining two geodesics connecting r)j to o as before; observe that in this case 
dc(t)i, r)2) = n + r2. 

In the case when d(a:i, 0 : 2 ) < vr and ri, r 2 > 0, every geodesic t) : I ^ € connecting t)i 
to t )2 is associated to a geodesic x in X joining xi to X 2 and parametrized with nnit speed 
in the interval [0, d(a;i, ^ 2 )]. To hnd the radins r{t), we use the complex plane C: we write 
the curve connecting Zi = ri G C to Z 2 = r 2 exp(i d(a:i, 0 : 2 )) G C in polar coordinates, 
namely 

( = {l-tYrl + t^rl + 2t{l-t)rir2COs{d{xi,X2)), 

z(t) = r(t) exp(i«(()), + ^ (8.5) 

f r[t) 

and then the geodesic curve in Cf takes the form 

t){t) = [x{e{t)),r{t)]. (8.6) 

Absolutely continuous curves in €. We want to obtain now a simple characteriza¬ 
tions of absolutely continuous curves in €. If f t)(f) is a continuous curve in €, with 
t G [0,1], is clear that r(f) := r(r)(f)) is a continuous curve with values in [0,cxo). We 
can then consider the open set = r“^((0, cxo)) and the map x : [0,1] ^ X dehned by 
x(f) ;= x(r)(f)), whose restriction to Or is also continuous. Thus any continuous curve 
t) : / —C can be lifted to a couple of maps y = y o i) = (x, r) : [0,1] —?■ X with r continu¬ 
ous and X continuous on Or and constant on its complement. Conversely, it is clear that 
starting from a couple y = (x, r) as above, then t) = p o y is continuous in £. We thus 
introduce the set 

C([0,1]; X) := {y = (x, r) : [0,1] —)■ X : r G C([0,1]; M_|_), x|^ is continuous } 
and for p > 1 the analogous spaces 

AC>'(10,l|;y)-{y = (x,r) : r e ACdO, 1]; R+), 

x|o, £ACL(0,i.Y), r|x'| 6 LdO,)}. 

If y = (x, r) G AC^([0,1]; X) we dehne the Borel map |y'| : [0,1] ^ R+ by 

|yf (t) := |r'(f)|2 + T\t)\x'\l{t) if t G Or, |y'|(t) = 0 otherwise. (8.9) 

For absolutely continuous curves the following characterization holds: 

Lemma 8.1. Let t) G C([0,1]; €) be lifted to y = y o p g C([0, 1]; X). Then p G AC^(/; C) 
if and only ifj= (x, r) G AC^([0,1]; X) and 

= |y'l(*) for ^'-a.e. t € [0,1], (8.10) 


(8.7) 

( 8 . 8 ) 
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Proof. By fl7.4p one immediately sees that if t) = p o y g AC^([0, 1]; C) then sr belongs 
to AC^([0,1]; M) and x G ACfQ^(Or; ^)- Since p is absolntely continnous, we can evalnate 
the metric derivative at a.e. t & Or where also r' and |x'| exist: starting from fl7.3p leads 
to the limit 

d|(p(t + h),p(t)) |r(t + h)-r(t)|2 + 4r(t + h)r(t)sin^(|d^(x(t + h),x(t))) 

/i4,o /ito 

= \At)\^ + s{t)\Al{t) 


which provides fIS.lOp . 

Moreover, the same calculations show that if the lifting y belongs to ACP([0,l];y) 
then the restriction of p to each connected component of Or is absolutely continuous with 
metric velocity given by fl8.10p in Lp( 0, 1). Since p is globally continuous and constant in 
[0,1] \ Or, we conclude that p G AC^([0, !];£)• □ 


As a consequence, in a length space, we get the variational representation formula 


dc(po,Pi) 


inf{ [ (r^t)\xm + W{t)\^)dt: 

J[0,l]n{r>0} ^ ^ 

(x,r) G AC^([0,1];F), [x(i),r(i)] 



( 8 . 11 ) 


Remark 8.2 (The Euclidean case). Consider the case X = with the usual Euclidean 
distance d{xi,X 2 ) := \xi —X 2 \. For p = [x, r] G AC^([0,1]; €), we can dehne a Borel vector 
held p£ : [0,1] —)■ by 




(r(t)x'(t),r'(t)) 

( 0 , 0 ) 


whenever r(t) ^ 0 and the derivatives exist, 
otherwise. 


( 8 . 12 ) 


Then, fl8.10l) yields |p'|d 5 ;(t) = |pe;(^)|Rd+i for .ifEa.e. t G (0,1). 

For ip G C^(R'^ X [0,1]) we set ({[x,r],t) := \ip{x,t)r'^ and obtain dtC,{[x,r].,t) := 
^dt'ijj{x,t)r‘^. Now dehning the Borel map : (t —)■ (R‘^+^)* via 


DcC(p,t) 


(Ir'Drc'ifix, t),r'p{x, t)) for p 7^ o, 
(0,0) otherwise. 


(8.13) 


we see that the map 1 1-^ ({t)(t),t) is absolutely continuous and satishes 

i) = pff(t))Mrf+i .^^-a.e. in (0,1). □ (8.14) 


Note that the hrst component of contains the factor r rather than r^, since p'^- in 
(8.12) already has one factor r in its hrst component. 
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8.2 Lifting of absolutely continuous curves and geodesics 

Dynamic plans and time-dependent marginals. Let (Z, dz) be a complete and 
separable metric space. A dynamic plan tt in Z is a probability measure in CP(C(/; Z)), 
and we say that tt has hnite 2-energy if it is concentrated on AC^(/; Z) and 

|z'|j^(t)dtj d7r(z) < oo. (8.15) 

We denote by the evaluation map in C(/; Z) given by et(z) := z(t). If tt is a dynamic 
plan, at = (et)tt7r G M(Z) is its marginal at time t E I and the curve t ^ at belongs 
to C(J; (M(Z), Wd^)). If moreover tt is a dynamic plan with hnite 2-energy, then a G 
(M(Z),Wd,)). 

We say that tt is an optimal geodesic plan between ao,ai G fP(Z) if (ej)[(7r = a^ for 
i = 0,1, if it is a dynamic plan concentrated on Geo(Z), and if 

y d|(z(0),z(l))d7r(z) = ^ |z'p df d7r(z) = W^^(ao, Oi). (8.16) 

When Z = € we will denote by f)^ = o (e^)}! the homogeneous marginal at time 
t E I. Since f)^ : ?*((£:) -E M(X) is 1-Lipschitz (cf. Corollary 17.131) . it follows that the 
curve Ht ■= = i)t'^ belongs to AC^(/; (M(X), hK)) and moreover 

< y |ll'ld£Wd7r(t)) for a.e. t G (0,1). (8.17) 

A simple consequence of this property is that (M(X), hK) inherits the length (or geodesic) 
property of {X, d). 

Proposition 8.3. (M(X),I-K) is a length (resp. geodesic) space if and only if {X,d) is a 
length (resp. geodesic) space. 

Proof. Let us hrst suppose that {X, d) is a length space (the argument in the geodesic 
case is completely equivalent) and let pi G M(X). By Corollary 17.71 we hnd at G CP 2 (<i) 
such that f)^Q!j = fit and hK(/ii,p, 2 ) = Wdj.(oi, 02 )- Since C is a length space, it is 
well known that is a length space (see [S]), so that for every k > 1 there exists 

a G Lip([0,1]; (y 2 ('^), Wdj.)) connecting oi to 02 such that |o'|wd^ < Wdj.(oi, 02 ). Setting 
Pi := l)^at we obtain a Lipschitz curve connecting pi to p 2 with length < fi;hK(pi,p 2 )- 
The converse property is a consequence of the next representation Theorem 18.41 and 
the fact that if (CP 2 (<^),Wd) is a length (resp. geodesic) space, then (£: and thus X are 
length (resp. geodesic) spaces. □ 

We want to prove the converse representation result that every absolutely continuous 
curve p : [0,1] —)■ (M(X), hK) can be written via a dynamic plan tt as pi = f)^7r. The 
argument only depends on the metric properties of the Lipschitz submersion f). 

Theorem 8.4. Let (pi)ig[o,i] be a curve in AC^([0,1]; (M(X), hK)), p E [1, 00 ], with 

O := \/Po{X) + f Ip'lhKdt. (8.18) 

Jo 
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Then there exists a curve (Q!t)tg[o,i] in AC^([0,1]; such that at is concentrated 

on £[0] for every t E [0,1] 

iat = i)‘^at in [0,1], \iXt\[K = Wtlw,^ for a.e. t E (0,1). (8.19) 

Moreover, when p = 2, there exists a dynamic plan tt E y(AC^([0,1]; ^)) such that 

at = (et)o7r, pt = i)t'^ = h^at in [0,1], 
iTtl'k = =J lO'Id^W d7r(t)) for a.e. t E (0,1). 

Proof. By Lisini’s lifting Theorem [301 Theorem 5] fl8.20|) is a consequence of the hrst part 
of the statement and 08.191) in the case p = 2. It is therefore sufficient to prove that for 
a given p E AC([0,1]; (M(X), hK)) there exists a curve a E AC([0,1]; (T 2(<^)5 Wd^.)) such 
that pt = l)^(at) and |p(| = \a't\ a.e. in (0,1). By a standard reparametrization technique, 
we may assume that p is Lipschitz continuous and |p(| = L. 

We divide the interval I = [0,1] into 2^-intervals of size 2~^, namely := [t^_i,t^] 
with tf^ := i 2~^ for i = 1,..., 2^. Setting pf := p^N we can apply the Gluing Lemma 
17.111 (starting from i = 0 to 2^) to obtain measures af E T 2 ('^) such that 

[)(«f) = pf, = H<(/if,/xf+i) < L2-\ (8.21) 

and concentrated on (f:[0Ar] where 

2iv 

= VMT + E ) £ 6). 

Thus if t is a dyadic point, we obtain a sequence of probability measures a^{t) E 0^2(^) 
concentrated on £[0] with f)^(a^(t)) = pt and such that Wd^{a^{t), a^{s)) < L\t — s| if 
s = m2~^ and t = n2~^ are dyadic points in the same grid. By the compactness lemma 
17.31 and a standard diagonal argument, we can extract a subsequence N{k) such that 
aN{k)(t) converges to a(t) in (CP 2 (<£), Wd^.) for every dyadic point t. Since lTdg.(a(s), a(t)) < 
L\t — s| for every dyadic s,t, we can extend a to a L-Lipschitz curve, still denoted by a, 
which satishes f)^(a(t)) = pt- Since f)^ is 1-Lipschitz, we conclude that |a'|(t) = \p't\ a.e. 
in (0,1). □ 

Corollary 8.5. Let {pt)t£[o,i] &e a cwrve m AC^([0,1]; (M(X), hK)) and let Q as in 08.181) . 

Then there exists a dynamic plan tt in T(C([0,1]; Y)) concentrated on AC ([0,1]; Y) such 
that at = (ef)o7r is concentrated in X x [0, 0], that pt = h^((ei)(j7r), and that 

= J |y'r(^) d7r(y) for t E [0,1], (8.22) 

where |y'| is defined in fl8.9p . 

Another important consequence of the previous representation result is a precise char¬ 
acterization of the geodesics in (M(X), hK). 
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Theorem 8.6 (Geodesics in (M(X),1-K)). 

(^) If tg[o,i] is a geodesic in (M(X),H<) then there exists an optimal geodesic plan 
TT in CP(Geo(£)) (recall fl8.16p ) such that 

(a) TT-a.e. curve t) is a geodesic in C, 

(b) [0, 1] 3 t at := (et)tt7r is a geodesic in (CP 2 ((r), where all at are 

concentrated on £[0] with 0^ = 2{iao{X) + l-K^(/io, yUi)), 

(c) fat = i)t'^ = for every t G [0,1], and 

(d) (e^,et)tt7r G Optn<(/i^,/i*) if0<s<t<l. 

(a) If {X,d) is a geodesic space, for every £ M(X) and every ct G Optn<(/io,/ii) 

there exists an optimal geodesic plan tt G y(Geo((r)) such that (es,et)jj7r = a. 

Proof. The statement (i) is an immediate conseqnence of Theorem 18.4[ 

Statement (ii) is a well known property of the Kantorovich-Wasserstein space (£, Wd^.) 
in the case when € is geodesic. □ 

Theorem 18.41 also clarihes the relation between hK and GhK introdnced in Section fTHl 

Corollary 8.7. //(X, d) is separable and complete then AG^([0,1]; (M(X), GhK)) coin¬ 
cides with AG^([0,1]; (M(X), hK)) and for every curve p G AG^([0,1]; (M(X), GhK)) we 
have 

lAi'|G4<(t) = l/iVW for .SP^-a.e. t G [0,1]. (8.23) 

In particular if {X, d) is a length metric space then hK is the length distance generated by 

GhK. 

Proof Since GhK < hK it is clear that AG2([0, 1]; (M(X), hK)) C AG2([0, 1]; (M(X), GhK)). 

In order to prove the opposite inclnsion and fl8.23p it is snfficient to notice that the 
classes of absolntely continnous curves in €. w.r.t. dg and gg coincide with equal metric 
derivatives Ih'ldj; = Ifl'lgj. Since GhK = hKg is the Hellinger-Kantorovich distance induced 
by g, the assertion follows by 08.201) of Theorem 18.41 □ 

8.3 Lower curvature bound in the sense of Alexandrov 

Let us first recall two possible definitions of Positively Gurved (PG) spaces in the sense of 
Alexandrov, referring to [5] and to CD] for other equivalent definitions and for the more 
general case of spaces with curvature > k. 

According to Sturm [43], a metric space (Z,dz) is a Positively Gurved (PG) met¬ 
ric space in the large if for every choice of points zo,Zi,--- , zjv E Z and coefficients 
Ai, • ■ ■ , ^ (0, +C)o) we have 

N N 

^ XiXjdKzi, Zj) < 2 ^ XiXjd\{zo, zj). (8.24) 

i,j=l ij=l 

If every point of Z has a neighborhood that is PG, then we say that Z is locally positively 
curved. 

When the space Z is geodesic, the above (local and global) definitions coincide with 
the corresponding one given by Alexandrov, which is based on triangle comparison: for 
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every choice of Zq^Zi,Z 2 G 2’, every t G [0,1], and every point Zt such that dz{zt,Zk) = 
\k—t\dz{zo, zi) for fc = 0,1 we have 

dUz2, Zt) > (1-t) dl(z2, zo) + t dz(^2, zi) - 2t(l - t) d|(zo, zi). (8.25) 

When Z is also complete, the local and the global dehnition are equivalent. Next we 
provide conditions on (W, d) or (£, d^) that guarantee that (M(X), hK) is a PC space. 

Theorem 8.8. Let {X, d) be a metric space. 

(i) If X G M. is convex (i.e. an interval) endowed with the standard distance, then 
(M(X), 1-K) is a PC space. 

(a) If (£, dc) is a PC space in the large, cf. 08.241) . then (M(X), l-K(X)) is a PC space. 

(Hi) //(X, d) is separable, complete and geodesic, then (M(X),I-K) is a PC space if and 
only if (X, d) has locally curvature > 1. 

Before we go into the proof of this result, we highlight that for a compact convex 
subset C with d > 2 equipped with the Euclidean distance, the space (M(f2), hK) 
is not PC, see m Sect. 5.6] for an explicit construction showing the semi concavity of the 
squared distance fails. 

Proof. Let us hrst prove statement (ii). If (C, d^) is a PC space then also (CP2(l2i), Wd^) is 
a PC space [S]. Applying Corollary 17.131 for every choice of /i* G M(X), i = 0,..., N, 
we can then End measures fli G 1p2{^) such that 

Wdg.(/3o, A) = H<(/io,hi) for z = 1,... ,X, (8.26) 

where it is crucial that (Iq is the same for every i. It then follows that 

N N N N 

< Y < 2 Y A.A,W2^(/3o, A) = 2 Hi)- 

i,j=l *j'=l *J = 1 *j’=l 


Let us now consider (iii) “=^”; If (M(X), hK) is PC, we have to prove that (X, d) has 
locally curvature > 1. By Theorem [51 Thm. 4.7.1] it is sufficient to prove that C\ {o} 
is locally PC to conclude that (C, d) has locally curvature > 1. We thus select points 
t)i = [xi,ri\, i = 0,1 ,2, in a sufficiently small neighborhood of t) = [x,r] with r > 0, so 
that d{xi,Xj) < 7r/2 for every i,j and ri,rj > 0. We also consider a geodesic t)t = [xt, St], 
t G [0,1], connecting r)o to t)i, thus satisfying d£(r)i, t)i) = \i — t|d(r)o, t)i) for z = 0,1. 
Setting /Zj := fit '■= StSxt, it is easy to check (cf. [571 Sect. 3.3.1]) that 

=dc(bi,t)i) for z,j G {0,1,2}, 
hK(/ii,/ifc) = \k- f|hK(po,hi) for k G {0,1}. 

We can thus apply fl8.25p to po,/zi,/i 2 ,/i* and obtain the corresponding inequality for 

(iii) “4=”: In order to prove the converse property we apply Remark 17.121 For 
/Z 0 ,/Zi,/Z 2 ,/Z 3 = /Zt G M(X) with t G [0,1] and hK(/Z 3 ,/Zfc) = \k - t|hK(/zo,/zi), we find 










a plan a G lP(Xo x Xi x X2 x X 3 ) (with the usual convention to use copies of X) such 
that 

= j = for (z,j) e A = {(0,3), (1,3), (2,3)}. (8.28) 

The triangle inequality, the elementary inequality f(l — t){a + b)^ < (1 — t)a? + tb"^, and 
the very dehnition of l-K yield for t G (0,1) the estimate 

f(l-t)|-K^(/ro,/^i) < ^(1 -^) J dc(ho, hi) da < J t(l - f) ((dc(ho, hs) + dc(h3, hi))^ da 

< J (l-f)d|(ho, hs) + tdl{r) 3 , hi) da = ^ 3 ) + tl-K^(/i3, hi) 

= t{l - f)l-K^(/io,hi)- 

This series of inequalities shows in particular that 

(1 -f)d£(ho, hs) +id£(h3, hi) = f(l -t) (dff(ho, ha) + d(i(h3, hi))^ = ^(1 -^)d£(ho, hi) «-a.e. 
so that 

d£(ho, ha) = ^dc(ho, hi) and d£(h3, hi) = (1 - ^)de;(ho, hi) «-a.e. 

Moreover, 7 rJ’°’“’^a G Opt^ 4 <;(/X 0 ) hi); so that fl8.28p holds for (i, j) E A' = AU {(0,1)}. 

By Theorem 17.201 we deduce that 

d(xj,Xj) < 7 r /2 a-a.e. for (i,j) G A. 

If one of the points hi, f = 0,1,2, is the vertex 0 , then it is not difficult to check by a 
direct computation that 


dc(h 2 , ha) > (1 - ^)d£(h 2 , ho) + td£(h 2 , hi) - 2t(l - t)d£(ho, hi)- (8-29) 

When hi G C\{o} for every i = 0, 1, 2, we use d(xo, xi)+d(xi, X 2 )+d(x 2 , xq) < |vr < 27r, and 
Theorem [9l Thm. 4.7.1] yields fl8.29p because of the assumption that X is PC. Integrating 
fl8.29p w.r.t. a, by taking into account fl8.28p . the fact that (tt^, 7r^)jjQ: G Opt^^<;(po, A^i), 
and that 

j d^(h 2 , hi) da > [-K^(/i 2 , /Ui) for i = 0, 1, 

we obtain 


H^^(h2, 7 ^ 3 ) ^ (1 ~ ^)kK^(/i2, /lo) + ^kK^(/^2, hi) ~ 2f(l — f)l-K^(/xo, hi)- 

Finally, statement (i) is just a particular case of (hi). □ 

As simple applications of the Theorem above we obtain that M(M) and M(S'^“^) 
endowed with hK are Positively Curved spaces. 
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8.4 Duality and Hamilton-Jacobi equation 

In this section we will show the intimate connections of the duality formula of Theorem 
17.211 with Lipschitz subsolutions of the Hamilton-Jacobi equation in X x (0,1) given by 

dS + ^\Bx^t\^ + 2^^ = 0 (8.30) 

and its counterpart in the cone space 

+ 2 (8.31) 

Indeed, the first derivation of l-K via LET was obtained by solving (18.301) for X = see 
the remarks on the chronological development in Section lAl 

At a formal level, it is not difficult to check that solutions to fl8.30p corresponds to the 
special class of solutions to fl8.3ip of the form 

Ct{[x,r]) ■= (8.32) 

Indeed, still on the formal level we have the formula 

iDccp = ^\^xC\" + la.cr = iDxeiV + 4eV2 t c = ( 8.33) 

Since the Kantorovich-Wasserstein distance on T 2 (^) can be defined in duality with subso¬ 
lutions to fl8.3ip via the Hopf-Lax formula and 2-homogeneous marginals are modeled on 
test functions as in fl8.32p . we can expect to obtain a dual representation for the Hellinger- 
Kantorovich distance on M(X) by studying the Hopf-Lax formula for initial data of the 
form (o{x,r) = ^o{x)r‘^. 


Slope and asymptotic Lipschitz constant. In order to give a metric interpretation 
to fl8.30p and (18.311) . let us first recall that for a locally Lipschitz function / : Z —)■ M 
defined in a metric space (Z, d^) the metric slope \Dzf\ and the asymptotic Lipschitz 
constant \L)zf\a are dehned by 


'Dzf\{z) := limsup 

X^Z 


\f{x)-f{z)\ 

dz{x,z) 


^zfUz) 


lim sup 

y^x 


Ifjy) - fix)\ 

dz{x,y) 


(8.34) 


with the convention that \Dzf\{z) = \Dzf\a{z) = 0 whenever .2 is an isolated point. 
|Dz/|a can also be defined as the minimal constant L > 0 such that there exists a 
function Gl : Z x Z —)■ [0, oo) satisfying 


\f{x) - f{y)\ < GL{x,y)dzix,y), limsupG l(x,?/) < L. (8.35) 

x^y^z 

Note that \Dzf\a is always an upper semicontinuous function. When Z is a length space, 
\L)zf\a is the upper semicontinuous envelope of the metric slope \L)zf\- We will often 
write |D/|, |D/|a whenever the space Z will be clear from the context. 




















Remark 8.9. The notion of locally Lipschitz function and the value \I^zf\a does not 
change if we replace the distance dz with a distance dz of the form 


dz{zi,Z2) ■■= h{dz{zi,Z2)) for 2;i,2;2 e 

with h : [0, oo) —)■ [0, oo) concave and lim- 

^ ^ rio r 


= 1 . 


(8.36) 


In particular, the truncated distances dz A k with k > 0, the distances asin((d 2 A n)/a) 
with a > 0 and k G (0,a7r/2], and the distance g = ^^(d) given by fl7.72p yield the same 
asymptotic Lipschitz constant. 

In the case of the cone space (t it is not difhcult to see that the distance dg; and d^/ 2 ,£ 
coincide in suitably small neighborhoods of every point t) G £\{o}, so that they induce 
the same asymptotic Lipschitz constants in £ \ {o}. The same property holds for g^. In 
the case of the vertex o, relation fl7.1ip yields 

p€fUo) < ^)/U(o) < V2 pefUo). □ (8.37) 

The next result shows that the asymptotic Lipschitz constant satishes formula fl8.33p 
for C([a;, t]) = 

Lemma 8.10. For ^ : X ^ W let W be defined by Cii^x]) ■= 


(i) If ( is d^-Lipschitz in ^[R], then ^ G Lip^(X) with 


sup 1^1 < sup Id < 4 Lip(C, <^[^]) and Lip(d ^) < 4 Lip(C, <^[R])- (8.38) 


X 


R2 


C[i?] 


R 


R 


(a) If ^ G Lip^(X), then ( is d^-Lipschitz in (L[i?] for every R> 0 with 


sup Id < sup Id and Lip^(d (^[1?]) < Lip^(d (-^, d))+4sup |d' 

€[R] X ^ X 

where d := 2sin(d^/2). 

(Hi) In the cases (i) or (ii) we have, for every x E X and r >0, the relation 

|Dxda(^) + 4^^(x))r^ for r > 0, 


|D£da([3^,^]) = 


0 


for r = 0. 


(8.39) 


(8.40) 


The analogous formula holds for the metric slope |D£d([3^)''"])■ Moreover, equation 
fl8.40p remains true if d^ is replaced by the distance d^/ 2 ,£. 


Proof. As usual we set t)j = [xj,rj] and t) = [x,r]. 

Let us hrst check statement (i). If C is locally Lipschitz then |^(a:)| = .^|C([x,i?]) — 
C([^) 0])| < Lip(d <i[R]) for every R sufficiently small, so that ^ is uniformly bounded. 
Moreover, using fl7.3p for every i? > 0 we have 

I^( 3 ^ 1 )-^( 3 ^ 2 )I < |C(a:i,R)-C(a;2,.R)| < Lip(C;(r[R])Rd(a;i,a;2) < Lip(C; (2:[R])Rd(a;i, X 2 ), 
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so that ^ is uniformly Lipschitz and fl8.38p holds. 

Concerning (ii), for ^ G Lip^(X) we set S := sup |^| and L := Lip(^, (X, d)) and use 
the identity 


((Oi) - ((> 12 ) = (^(a:i) - ^{X2))rir2 + 2^(a:)r(ri - ra) + tja; ll)(ri - ra), (8.41) 
where a;(t)i, Qa; D) := - 2r^(a;) with lim a;(t)i, Qa) = 0. 


Since |ci;(t)i, t)a; 0)| < 2RS if t)i G €[R], equation fl8.4ip with r = 0 yields 

lC(Oi) - (( 02)1 < hd(a;i,a;a)rir 2 + 2RS\ri - ra] < 2(L^ + 4r^)^^^/2dc(l)i, t^a). 


Letting i? 0 the inequality above also proves 08.401) in the case r = 0. 

In order to prove O8.40p when r 7 ^ 0 let us set := |De;(^|^([x,r]), Lx ■= |Dx^|a(3;), 
and let Gl be a function satisfying 08.35p with respect to the distance d (see Remark 18^^ . 
Equation 08.4ip yields, for all t) = [x,r], the relation 

lC(‘li) -C(»l2)| < GL(a;i,a;2)d(a;i,xa)rir2 + ( 2 |^(a;)|r+ |a;(t)i, tja; l))l) ki - r2\ 

( 2\^/2 
< (^G^(xi,X2)rir2 + ( 2 |^(x)|r + |a;(l)i,r)2;t))|) j d£(t)i,r)2). 

Passing to the limit t)i,t )2 —)■ t) and using the fact that xi,X 2 x due to r 7 ^ 0, we obtain 

+ 4|^(x)pj . 

In order to prove the converse inequality we observe that for every L' < Lx there exist two 
sequences of points (a:j,„)ngN converging to x w.r.t. d such that — ^{x 2 ,n) > L'6n 

where 0 < 5^ := d(a:i^„, Xa,^) ^ 0. Choosing ri„ := r and r 2 ,„ = r(l + A(5„) for an 
arbitrary constant A G M with the same sign as ^{x), we can apply fl8.4ip and arrive at 


Lg 


> lim inf 

n^cx) 


|C(t)l,n) C(t)2,n)| 
d£(t)l,n, tl2,n) 


^ L'Snr^+2\^{x)\r^\X\6n + ojSn) _ ^ L^+2|^(x)| |A| 

\/A2 +1 


Optimizing with respect to A we obtain 


-hff > r‘^{{L'Y + 4|^(a:)|^), where L' < Lx is arbitrary. 

This proves fl8.40p for the asymptotic Lipschitz constant |Dg(C|a. The arguments for prov¬ 
ing fl8.40p for metric slopes |Dc(C| are completely analogous. □ 


Hopf-Lax formula and subsolutions to metric Hamilton—Jacobi equation in the 
cone C. Whenever / G Lip^(£) the Hopf-Lax formula 

^t/(‘l) := inf (/(»!') + 7 ^d^(t), t)')) for t) G (T and t > 0, (8.42) 

provides a function t i-G- which is Lipschitz from [0, 00 ) to C;,(C), satishes the a-priori 
bounds 

inf / < ^i/< sup/, Lip(.^t/;£) < 2Lip(/, (T), (8.43) 

£ £ 
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and solves 

dt^tfid) + ^|D£^t/|^( 3 ) < 0 for every 3 e C, f > 0, (8.44) 

where denotes the partial right derivative w.r.t. t. It is also possible to prove that for 
every i) G C the time derivative of exists with possibly countable exceptions and 

that fl8.44p is in fact an equality if (£, dc) is a length space, a property that always holds 
if (X, d) is a length metric space. This is stated in our main result: 

Theorem 8.11 (Metric subsolution of Hamilton-Jacobi equation in X). Let ^ G Lip^(X) 
satisfy the uniform lower hound P := infx ^ + 1/2 > 0 and let us set (/([xjr]) := ^(a;)r^. 
Then, for every t G [0, 1] we have 


^tCi\x,r]) = ^t{x)r‘^, where := and 


(8.45) 


:= inf ( , /if,/,' + 


sin^(d^/2(x,x0) 

x'ex Vl+2f^(a:') ' 2t(l+2f^(x')) 

Moreover, for every i? > 0 we have 

1 


= inf — 
x'&x 2t 


/ _ cos^(d^/2(x,x0) \ 
V l + 2f^(x') / 


P{x)r‘^ = inf 

ri' = [x',r']e£[R] 


(^{x'){r'Y + —(f^^{[x,r]][x' ,r'\^ for all x ^ X, r < PR. (8.46) 


The map t ft is Lipschitz from [0,1] to Cb{X) with ft G Lip^(X) for every t G [0,1]. 
Moreover, ft is a subsolution to the generalized Hamilton-Jacobi equation 

d^ft{x) + ^\Dxft\l{x) -I- 2f^{x) < 0 for every x E X and t G [0,1]. (8.47a) 

For every x E X the map 1 1 —)■ ft{x) is time differentiable with at most countable exceptions. 
If {X, d) is a length space, fl8.47al) holds with equality and |Dx'Cf|a(3^) = for every 

X E X and t E [0,1] .■ 

dfft(x) + ^\Bxft\l{x) + 2f^{x) = 0, |Dxet|a(x) = iDxetl(^). (8.47b) 


Notice that when f{x) = .^ is constant, 08.451) reduces to ^tf = 'C/(l + ‘2tf) which is 
the solution to the elementary differential equation ^f + 2f‘^ = 0 . 

Proof. Let us observe that inftg[o,i], 2 ex(l + 2tf{z)) = P > 0. A simple calculation shows 


^(x')(r')^ + ^dg([x,r]; [x',r']) = ^(^{l+2tf{x')){r'f + r^ - 2r r' cos{d^{x,x')) 

{l+2tf{x'))r' — cos(d,r(a;, x'))r) + r^f2tf(x') + sin^(d^(a:, x')) 


2t(l+2tf(x')) I 


Hence, if we choose 


r cos(d^(a:, a;'))/(l-|-2f,^(a;')) if d(x,x') < 7 r /2 
0 otherwise. 
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we find (notice the trnncation at 7r/2 instead of vr) 


inf e(a:')(r')^ 

r'>0 



([x,r]; [x',r']) 


a(i+2t{(x0) 


which yields fl8.45p and fl8.46p . 

Eqnation fl8.46p also shows that the fnnction C,t = coincides on €\PR\ with 

the solntion C,^ given by the Hopf-Lax formula in the metric space C[i?]. Since the initial 
datum C, is bounded and Lipschitz on C[i?] we deduce that is bounded and Lipschitz, 
so that 11-4 is bounded and Lipschitz in X by Lemma 18.101 

Equation fl8.47ap and the other regularity properties then follow by fl8.40p and the 
general properties of the Hopf-Lax formula in □ 


Duality between the Hellinger-Kantorovich distance and subsolutions to the 
generalized Hamilton-Jacobi equation. We conclude this section with the main 
application of the above results to the Hellinger-Kantorovich distance. 


Theorem 8.12. Let us suppose that (W, d) is a complete and separable metric space. 


(i) If II & AC^([0,1]; (M(X), H<)) and ^ : [0,1] —)■ Lipj(X) is uniformly hounded, Lips¬ 
chitz w.r.t. the uniform norm, and satisfies fl8.47ap . then the curve t f ftdjUt is 
absolutely continuous and 

^ J d/Xf <(8.49) 

(a) If {X, d) is a length space, then for every /io, fii and k E NU {cxo} we have 


^|-K^(/io,/ii) = sup I y d/ii - y ^0 d/io : ^ G C''([0,1]; LiP6(^)), 

dS{x) + ^\I)x^t\‘^{x) + 2^‘f{x) <0 inX x{0, 1 )|. 


(8.50) 


Moreover, in the above formula we can also take the supremum over functions ^ G 
C^([0,1]; Lip^(X)) with bounded support. 

Proof. If ^ satishes fl8.47al) then setting Ct([3;5 ’"]) := ^t{x)r^ we obtain a family of functions 
t Ct, t ^ [0,1], whose restriction to every £[i?] is uniformly bounded and Lipschitz, and 
it is Lipschitz continuous with respect to the uniform norm of Cf,(Ci[i?]). By Lemma [8.101 
the function ( solves 

dfCt + ^\BeCt\l<0 in£x(0,l). 

According to Theorem 18.41 we hud 6 > 0 and a curve a G AC^([0,1]; (1P2((1:[6*]), Wd^.)) 
satisfying 08.191) . Applying the results of [6l Sect. 6], the map 1 1 —)■ f^(tdat is absolutely 
continuous with 

d /* 1 

— J^Ctdat < P^^-a.e. in (0,1). 

Since J^-Ctdat = J^^tdfvt we obtain 08.49p . 
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Let us now prove (ii). As a first step, denoting by S the right-hand side of fl8.50p . we 
prove that l-K^(p.o, /ii) > S. If ^ G C^([0, l];Lip^(X)) satishes the pointwise inequality 


dMx) + -|Dx6r(a;) + < 0, 


(8.51) 


then it also satishes fl8.47ap . because fl8.5ip provides the relation 
1 


-|Dx6|^(x) < -(^dt^t{x) + for every (x,f) G X x (0,1), (8.52) 


where the right hand side is bounded and continuous in X. Equation fl8.52p thus yields 
the same inequality for the upper semicontinuous envelope of and this function 

coincides with |Dx^t|a since X is a length space. 

We can therefore apply the previous point (i) by choosing A > 1 and a Lipschitz 
curve /i : [0,1] —?■ M(X) joining hq to with metric velocity |/i(|n< < AH<(/io,/Ui), whose 
existence is guaranteed by the length property of X and a standard rescaling technique. 
Relation fl8.49p yields 


2 / ^idyUl 

Jx 


2 / ^od/io< f < A2H<^(/ro,/ri). 

Jx Jo 


Since A > 1 is arbitrary, we get l-K^(/io,/ii) > S. 

In order to prove the converse inequality in fl8.50p we hx r; > 0 and apply the dual¬ 
ity Theorem 17.211 to get .^0 ^ Lip^^(X) (the space of Lipschitz functions with bounded 
support) with inf ^0 > “1/2 such that 


1^0 d/ii - 2 / ^0 d/io > H<^(ho, hi) - V- 


(8.53) 


'X 


'X 


Setting ■= J^t^o we hnd a solution to fl8.47ap which has bounded support, is uniformly 
bounded in Lip^(X) and Lipschitz with respect to the uniform norm. We have to show that 
(6)te[o,i] can be suitably approximated by smoother solutions G C°°([0,1]; Lip^(X)), 
e > 0, in such a way that / d/i* —> f d/Xj as e j, 0 for z = 0,1. 

We use an argument of [1], which relies on the scaling invariance of the generalized 
Hamilton-Jacobi equation; If ^ solves 08.511) and A > 0, then ^^(x) := A^\t+to(x) solves 
08.511) as well. Hence, by approximating with A.^(Af-|-(l—A)/2, x) with 0 < A < 1 and 
passing to the limit A t 1; if is not restrictive to assume that ^ is dehned in a larger 
interval [a,b], with a < 0,5 > 1. Now, a time convolution is well dehned on [0,1], for 
which we use a symmetric, nonnegative kernel k G C“(M) with integral 1 dehned via 


^tix) ■■= {^{■){x) * Ks)t = / ^n}{x)Ke{t-w)dw, wheie K^{t) := E ^(f/e), ( 8 . 54 ) 

yields a curve G C°°([0, l];Lipf,(X)) satisfying 

*^<^ + 2 (^ 0 ) *Ke<0 in X X [0,1]. 

By Jensen inequality * zcg > (.^(.) * zce)^ and iDx'CcoP * ^ (|Dx'C(-)l * ^e)^- Moreover, 

applying the following Lemma [8.131 we also get |Dx'C( )| * ^ |Dx'C/'C( )|5 so that the 


93 
























smooth convolution satisfies fl8.5ip . Since —)■ uniformly in X for every t G [0,1], 

we easily get 

S'>lim2( [ ^id/xi- [ d/io) > l-K^(/ro,/ii) - ??• 

\Jx Jx ^ 

Since r/ > 0 is arbitrary the proof of (ii) is complete. □ 

The next result shows that averaging w.r.t. a probability measure tt G fP(r2) does not 
increase the metric slope nor the asymptotic Lipschitz constant. This was used in the 
last proof for the temporal smoothing and will be used for spatial smoothing in Corollary 
ETl 

Lemma 8.13. Let (X, d) he a separable metric space, let (12, !B,7r) he a probability space 
(i.e. 7r(12) = 1) and let G Lipj,(X), ca G 12, be a family of uniformly bounded functions 
such that sup^gf^ Lip(,^^; X) < cxo and to ^ui{x) is ‘B-measurable for every x E X. Then 
the function x i-G> f{x) := f^faj(x)d7r(to) belongs to Lip^(X) and for every x E X the 
maps u i-G- |DxCw|(x) and to e-)■ |Dx^a;|a(2^) are 'B-measurable and satisfy 


|Dv^|a(a;) < f |DxCa;|a(a:)d7r(a;), |Dx^|(x) < f d7r(a;). (8.55) 

Jx Jx 

Proof. The fact that E Lip^(X) is obvious. To show measurability we fix a; G X 
and use the expression 08.341) for |Dx'C|a(3^)- It is sufficient to prove that for every r > 
0 the map to i-G- Sr,u}{,x) := supy^j.g^^j'^,) |^(j(|/) — f^^{z)\ld{ii,z) is S-measurable. This 
property follows by the continuity of and the separability of X, so that it is possible to 
restrict the supremum to a countable dense collection of points Br{x) in Br{x). Thus, the 
measurability follows, because the pointwise supremum of countably many measurable 
functions is measurable. An analogous argument holds for |Dx^aj|- 
Using the definition ^ / ^t^dvr we have 


ie(?/)-e(^)i 

d(2/,^) 


< 


|^a;(2/) -'^c..(^)| 

d(2/,^) 


d7r(ci;) for y ^ z. 


Taking the supremum with respect to y,z E Br{x) and y ^ z, we obtain 


sup 

y^zeBrix) 


\m-m\ 

d{y,z) 


< / Sr,aj(x) d7r(a;). 


A further limit as r J, 0 and the application of the Lebesgue Dominated convergence 
Theorem yields the first inequality of fl8.55p . The argument to prove the second inequality 
is completely analogous. □ 

When X = the characterization fl8.5Up of l-K holds for an even smoother class of 
subsolutions f of the generalized Hamilton-Jacobi equation. 

Corollary 8.14. Let X = be endowed with the Euclidean distance. Then 


H<2(/ro,/ii)=2sup|^6d/ii-^eod/io : ^ e Cr(K" x [0,1]), 

+ 2ff^{x) <0 m X X (0,1)|. 


(8.56) 
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Proof. We just have to check that the supremum of fl8.50p does not change if we substitute 
C°°([0,1]; Liph^(M'^)) with x [0,1]). This can be achieved by approximating any 

subsolution ^ G C°°([0,1]; Lip^^(M'^)) via convolution in space with a smooth kernel with 
compact support, which still provides a subsolution thanks to Lemma 18.131 □ 


8.5 The dynamic interpretation of the Hellinger-Kantorovich 
distance “a la Benamou-Brenier” 

In this section we will apply the superposition principle of Theorem 18.41 and the duality re¬ 
sult 18.121 with subsolutions of the Hamilton-Jacobi equation to quickly derive a dynamic 
formulation “a la Benamou-Brenier” mm, ^ Sect. 8] of the Hellinger-Kantorovich 
distance, which has also been considered in the recent [21]. In order to keep the expo¬ 
sition simpler, we will consider the case X = with the canonical Euclidean distance 
d(xi,a;2) := \xi — X 2 \, but the result can be extended to more general Riemannian and 
metric settings, e.g. arguing as in [6l Sect. 6]. A different approach, based on suitable 
representation formulae for the continuity equation, is discussed in our companion paper 


Our starting point is provided by a suitable class of linear continuity equations with 
reaction. In the following we will denote by fij G M(M'^ x [0,1]) the measure 


f d/i/ := 


induced by a curve /i G C°([0,1]; 



ft{x) d^t{x) dt 


(8.57) 



Definition 8.15. Let /i G C°([0,1]; M(M'^)), let {v,w) : x (0,1) —?■ be a Borel 

vector field in L^(M'^ x (0,1), p/; thus satisfying 

\vfix)\‘^ + w^{x)^ dyit{x) dt = j |(u, t(;)p d/i/< oo. (8.58) 

We say that /i satisfies the continuity equation with reaction governed by {v,w) if 

dtPt + V • {vtyit) = Wtyit holds in the sense of distributions in x (0,1), (8.59) 

i.e. for every test function f G x (0,1)) 



dt^t{x) + I)^^t(,x)vt{x) ft{x)wt{x)) dfit df = 0 


>0 JRd- 

An equivalent formulation [21 Sect. 8.1] of fl8.59p is 
d 


(8.60) 


— f{x)diJ.t{x)= (D^^{x)vt{x) + f{x)wt{x)) djat in ^'(0,1), (8.61) 

Cli Jwd .had V / 


for every ^ G 0^(1^^^). We have a hrst representation result for absolutely continuous 
curves t i-G> /i/, which relies in Theorem 18.41 where we constructed suitable lifted plans 
TT G T(AC^([0, 1]; C)), i.e. ht = where € is now the cone over 
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Theorem 8.16. Let {tit)teio,i] be a curve in AC^([0, 1]; (M(R‘^), hK)). Then fi satisfies 
the continuity equation with reaction fl8.59p with a Borel vector field {v,w) G x 

(0,1),/U/; satisfying 

{vt,wt) j dut < for -a.e. t e (0,1). (8.62) 


Proof. We will denote by I the interval [0,1] endowed with the Lebesgue measure A = 
^ L[0, 1]. Recalling the map (x, r) ; £ —)■ x [0, oo) we dehne the maps x/ : C(/; C) x J —)■ 
X / and R: C(/;(t) x / —)■ R+ via x/(z, t) := (x(z(t)),t) and R(z, t) := r(z(t)). 

Let TT be a dynamic plan in (t representing fit as in Theorem 18.41 We consider the 
deformed dynamic plan tt/ := (R^7r)(8)A, the measure fii := and the disintegration 

of with respect to p/. Notice that tt < ©tt, where 0 is given by fl8.18p . 
and that ^ 

Ti = [ dt) dX{t), (8.63) 

Jo 

coincides with fii in fl8.57p . because for every ^ G B6(R'^ x I) we have 


^d/i/= / ^(x(z(f)),t)r2(z(f)) d7r7(z,t) = 



ft{x) dfifix) dt = / fdfii- 


Let u G L^(AC^(/; C) x /;7r (g) A;R'^'''^) be the Borel vector held u(t),t) := hc(f) for 
every curve r) G AC^(/;(£) and t E I, where r)(. is dehned as in 08.1211 . By taking the 
density of the vector measure (x7-)jj('U7r/) with respect to fij we obtain a Borel vector held 
uj = {v,w) G L^(R'^ X J;/i/; R‘^+^) which satishes 


uj{xfi) = / udr^x^t for/i/-a.e. (a:,t)GR'^x/ and / {^Vtf-\-wf^dfit^\p’t\^- (8.64) 


Choosing a test function (C([x,r],t) := f,{x)ri(t)r‘^ with ^ G C^(R'^) and 7] G C^(/) we can 
exploit the chain rule 08.14p in R'^ and hnd 


-[ if i^Ttdt = - ! r]'{t)f{x)dfii = - [ f{x{t){t))r^{t){t))r]'{t)d{Tz ^ X) 

Jo Jr<^ jR<ixI J 

= - I dtC{t){t),t) d{7T ^ X) = J (- ^C(y(f),f) + (DcC(y(f),f),y£(f))) d(7r 0 A) 

^ + y'((Dx^(x/),2^(x/)),w)RM(7r® A) 

= j v{t){(Dx^{x),2^{x)),ui)dfii 

= [ V{t) [ ({Dx^{x),vt{x)) + 2^{x)wt{x))dfitdt. 

Jo JRd V / 

Setting Wt = 2wt the continuity equation with reaction 08.6ip holds. □ 

The next result provides the opposite inequality, which will be deduced from the dual¬ 
ity between the solutions of the generalized Hamilton-Jacobi equation and hK developed 
in Theorem 18.121 
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Theorem 8.17. Let be a continuous curve in that solves the continu¬ 
ity equation with reaction 08.591) governed by the Borel vector field {v,w) G x 

[0,1],/ij; with /i/ given by 08.571) . Then ja G AC^([0,1]; hK)) and 

1 


\T't? < / (\vt? +/or .if^-a.e. t G (0, !)• 

.hod V 4 / 


(8.65) 


Proof. The simple scaling ^{t,x) {b—a)f{a-\-{b—a)t,x) transforms any snbsolution of 

the Hamilton-Jacobi equation in [0,1] to a subsolution of the same equation in [a,b]. 
Thus, 


|-K^(/io, Pi) = 2(6—a) sup 


d/ii - d/io : / e c; 


oo finid 


X [a, 6]), 


dSix) < 0 in x (a. 


( 8 . 66 ) 


Let f G C[/’(M'^ X [0,1]) be a subsolution to the Hamilton-Jacobi equation dt^ -|- 

2/^ < 0 in X [0,1]. By a standard argument (see [21 Lem. 8.1.2]), the integrability 
and the weak continuity of t fit yield 


Co dp-ij 


Co d/^io 2 
< 2 

< 


n 

Ito 

n 

Jto JR‘i 

n 

'to JRd. 


+ (DxCt) '^t) + ^tWtj dfit df 
D,j,Cir - 2Ct^ + (DxCo xit) + dfit dt 
dfit dt. 


1 

T 


Applying Corollary 18.141 and 08.66P we hnd 


fK‘^{fito,fiti) < {ti-to) [ [ (\vt\^+ ^\wt\‘^) dfitdt for every 0 < to < H < 1, 

-ho Jr'' ^ 4 / 


which yields fl8.65p . 


□ 


Combining Theorems 18.161 and 18.171 with Theorem 18.41 and the geodesic property of 
(M(]R'^), IK) we immediately have the desired dynamic representation. 

Theorem 8.18 (Representation of hK a la Benamou-Brenier). For every fio, fii G M(M'^) 
we have 


IK^(/io,/ii) = minj / ! j^Vt\'^+ ^\wtf^ dfitdt : /i G C([0,1]; M(M'^)), = /r*. 


dtfit + V ■ {vtfit) = Wtfit in 


?,'tTn>d 


X (0,1))}. 


(8.67) 


The Borel vector field {v,w) realizing the minimum in fl8.67p is uniquely determined fij- 
a.e. in ISfi x (0,1). 

The discussion in IZ7I reveals however that there may be many geodesic curves, so 
in general fii is not unique. Indeed, the set of all geodesics connecting fio = aoJ^o ^md 
fii = ai6x^ with Oo, Oi > 0 and \xi—Xo\ = vr/2 is infinite dimensional, see [23 Sect. 5.2]. 
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8.6 Geodesics in M(]R'^) 

As in the case of the Kantorovich-Wasserstein distance, one may expect that geodesics 
in (M(R‘^), hK) can be characterized by the system (cf. [271 Sect. 5]) 

+ V • dt^t + + 2(^i = 0. (8.68) 

In order to give a precise meaning to fl8.68p we hrst have to select an appropriate reg¬ 
ularity for On the one hand we cannot expect smoothness for solntions of the 
Hamilton-Jacobi eqnation 08.681) (in contrast with snbsolntions, that can be regnlarized 
as in Corollary 18.141) and on the other hand the a.e. differentiability of Lipschitz 
fnnctions gnaranteed by Rademacher’s theorem is not snfficient, if we want to consider 
arbitrary measnres fit that could be singular with respect 

A convenient choice for our aims is provided by locally Lipschitz functions which are 
strictly differentiable at /i/-a.e. points, where fij has been dehned by 08.571) . A function 
/ : —)■ M is strictly differentiable at x G if there exists D/(x) G (W^)* such that 


fix') - f{x") - Bf{x){x' - x") ^ ^ 


x' ,x" ■ 
x^^x 


(8.69) 


According to m Prop. 2.2.4] a locally Lipschitz function / is strictly differentiable at 
X if and only if the Clarke subgradient m Sect. 2.1] of / at X reduces to the singleton 
{D/(x)}. In particular, denoting by D C the set where / is differentiable and denoting 
by Kg a smooth convolution kernel as in 08.54p . Rademacher’s theorem and [131 Thm. 2.5.1] 
yield 

lim D/(x') = D/(x), hmD(/ * Ke)ix) = D/(x) for all x G £>. (8.70) 

x'^x grlO 

x' 

In the proofs we will also need to deal with pointwise representatives of the time derivative 
of a locally Lipschitz function : M'’* x (0,1) —)■ M: if D{dff) will denote the set (of full 

^d+i 

measure) where ^ is differentiable w.r.t. time and dt^ the extension of dff to 0 
outside Didt^), we set 

{dt^t)-ix) := liminf [dS * Kg)(x), {dS)^ix) := limsup {dS * Ke)(x). (8.71) 


It is not difficult to check that such functions are Borel; even if they depend on the 
specihc choice of Kg, they will still be sufficient for our aims (a more robust dehnition 
would require the use of approximate limits). 

We are now ready to characterize the set of all geodesic curves by giving a precise mean¬ 
ing to fl8.68p . The proof that the conditions (i)-(iv) below are sufficient for geodesic follows 
directly with the subsequent Lemma 18.201 whereas the proof of necessity is more involved 
and relies on the existence of optimal potentials for LET = in Theorem I6.3f di. 
on the characterization of subsolutions of the generalized Hamilton-Jacobi equation in 
Theorem 18.111 and on the characterization of curves t ^ fit in AC^([0,1]; (M(M'^), LK)). 

Theorem 8.19. Let fi G C*’([0,1];M(M'^)) he a weakly continuous curve. If there exists a 
map ^ G LipjQ^((0,1); CbiM.‘^)) such that 
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(i) it £ Lip^(R‘^) for every t e (0,1) with t Lip(,^t,R‘^) locally bounded in (0,1) 
(equivalently, the map (x, t) it{x) is hounded and Lipschitz in R'^ x [a, b] for every 
compact subinterval [a, b] C (0,1) 

(a) i is strictly differentiable w.r.t. x at fij-a.e. {x,t) G R'^ x (0,1), 

(Hi) i satisfies 

dtit + + 2i({x) = 0 m R'" x (0,1), (8.72) 

(iv) and the curve (/it)te[o,i] solves the continuity equation with reaction with the vector 
field (D3.^,4^) in every compact suhinterval of (0,1), i.e. 

dtfit + V ■ in ^'(R'^ x (0,1)), (8.73) 

then p is a geodesic w.r.t. the hK distance. Conversely, if p, is a geodesic then it is 

possible to find f G LipiQc((0,1); C6(R'^)) that satisfies the properties (i)-(iv) above, is 

right differentiable w.r.t. t inM'^ x (0,1), and fulfils fl8.47bD everywhere in R"^ x (0,1). 

Notice that 08.721) seems the weakest natural formulation of the Hamilton-Jacobi equa¬ 
tion, in view of Rademacher’s Theorem. The assumption of strict differentiability of f at 
/i/-a.e. point provides an admissible vector held Dx'C for 118.73p . 


Proof. The proof splits into a sufficiency and a necessity part, the latter having several 
steps. 

Sufficiency. Let us suppose that satisfy conditions (i),..., (iv). 

Since D(dtf) has full .J2f'^’''^-measure in R'^ x (0,1), Fubini’s Theorem shows that N : = 
{t G (0,1) : =2f'^({x G R'^ : (x,t) ^ D(dtf,)}) > 0} is =2f^-neghgible. By 08.72^ we get 

(dtf)_(x) = -lim^sup + 2^^) * fi:,)(x) > -^|D^i|2(a;) - 2f((x) (8.74) 

for every x G R'^ and t G (0,1) \ N. 

We apply Lemma [8.201 below with v = and w = Af: observing that |D,^t|(j(x) = 
|Da;^t(x)| at every point x of strict differentiability of ft, we get, for all 0 < a < 6 < 1, 

2 / fbdpb-‘2 [ [ ((dtf)- + \B^ft(x)\‘^+ 4:f((x))dpi 

jRd, J]Rd jR<ix{a,b) ^ ^ 

™2/ (^^\B,,ft(x)\‘^+ 2f((x)') dpi^ f \p't\‘^dt > -^HC(pa,pb). 

jR‘^x{a,b) ^ Ja 

On the other hand, since R'^ is a length space. Theorem 18.121 yields 

-^H(^(Pa,Pb) >2 f fbdpb-2 f fadpa, 

O — CL J-^d J-^d 

SO that all the above inequalities are in fact identities and, hence, 

l-K(ha, Pb) = (b- Cl) \p't\ =5f^-a.e. in [a, b]. 
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This shows that /i is a geodesic. Passing to the limit as a 0 and 6 t 1 we conclude the 
proof of the first part of the Theorem. 

Necessity. Let (/it)ie[o,i] be a l-K-geodesic in connecting po to pi; applying Theo- 

rem l8.16l we can find a Borel vector field (-n, w) E x (0,1), p/; such that fl8.59p 

and fl8.62p hold. We also consider an optimal plan 7 E Opti£j-(/ii,/i 2 )- 

Let ' 0 i )'02 : t [—cxo, 1 ] be a pair of optimal potentials given by Theorem 16.31 dl 

and let us set ^ := —IV'i ^md ■= for t E (0,1). Even if we are considering more 
general initial data ^ E [—1/2,+cxo]) in fl8.45p . it is not difficult to check that the 

same statement of Theorem 18.111 holds in every subinterval [a, b] with 0 < a < fe < 1 and 

lim = sup where := lim inf ^(x') (8.75) 

44.0 '’10 x'GBr{x) 

is the lower semicontinuous envelope of Moreover, setting 






(8.76) 


the function ,^1 is upper semicontinuous, and optimality yields 


1 

2 


= 6(a^) 


for 72 -a.a. X E W^. 


(8.77) 


By introducing the semigroup := — ^t(—0 and reversing time, we can define 


6 


i-t- 


(8.78) 


By using the link with the Hopf-Lax semigroup in € given by Theorem 18.Ill the optimality 
of (- 01 ,' 02 ), and arguing as in [13 Thm. 7.36] it is not difficult to check that 

in ^0 = ^0 = -^01 /io-a.e. in (8.79) 

Notice that the function x 1 —)■ — cos^dx — x'\ f\ 7 r/ 2 ) has bounded first and second deriva¬ 
tives, so it is semiconcave. It follows that the map x e-)■ ^t{x) is semiconcave for every 

t E (0,1) and x it{,x) is semiconvex. 

Since t f dfit and t i-e- f d/i* are absolutely continuous in (0,1), Theorem 18.12f if 
yields 

^ I ( 8 . 80 ) 

so that 

dpb d/^a ^ ^ bK (/ig, /il). 

Passing to the limit first as a 0 0 and then as & 0 1 by monotone convergence (notice that 
6 < 1 / 2 ) and using optimality once again, we obtain 


|-K^(/io,pi) = / 0 id/io+ / 02d/ii=2 / ^id/ii -2 / ^0 dpo 


= lim 2 ( / ^ftd/ift- / 
( 2'«|..0 1 


( 8 . 81 ) 
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By fl8.80p it follows that 

dt 


(8.82) 


j 6 d/it = ^|/i(p = ^l-K2(/io,/ii) in (0,1). 

Reversing time, the analogous argument yields 

ltd/it = ^|/i(P = ^|-K2(/io,/ii) in (0,1). (8.83) 

Hence, we have proved that the maps 1f d/it and 1f d/it are affine in [0,1] and 
coincide at t = 0 and t = 1, which implies that 


/ 


6 d/it 


J ^td/it for every t E [0,1]. 


(8.84) 


Recalling fl8.79p . we deduce that the complement of the set := {x E = ^t(2^)} 

is /it-negligible. Since is Lipschitz and semiconcave (thus everywhere superdifferen- 
tiable) for t E (0,1) and since ^t is Lipschitz and semiconvex (thus everywhere subdiffer- 
entiable), we conclude that ^t is strictly differentiable in Zt, and thus it satishes conditions 
(i) and (ii). 

Since (iii) is guaranteed by Theorem 18.111 (W^ is a length space), it remains to check 
08.731) . We apply the following Lemma [8.201 bv observing that [31 Prop. 3.2,3.3] and 
Theorem 18.111 yield 

limsup < limsup c?t~6(^0 ^ liminf ^ 

x'^x x'^x 

since = df^t{.x) /i/-a.e. we get 

{dtiY = {dtO- = dti /i/-a.e. 


and therefore 08.85p holds with equality. 

Recalling that |D,^t|^(a;) = |D 3 ;(^i(a:)p at every point of Z^, for every 0<a<fe<lwe 
have 


h- 


-|-K^(/io,/ii) = / ^fed/ib- / ^ad/ia^^ / (df^ + Dxiv+ S,w)(lpi 

ljn>d ITOd ljn>dKyf^J^\ V / 


'R'^x(a,b) 

/ 

'R'^x(a,b) 


R'^x (a,6) 

) dpi 

1 


ll 8 : 62 ll 

< 


( - - 'yp - 2(^t - -wf + -|n|2 + dpi 

- vf + 2{^t - dpi + ^ j \pt\^dt 


'R‘*x(a,f)) 


We deduce that v = Dj,.^ and w = 4^ holds pi-a.e. 


□ 


The following lemma provides the “integration by parts” formulas that where used in 
the sufficiency and necessity part of the previous proof of Theorem 18.191 It is established 
by a suitable temporal and spatial smoothing, involving a smooth kernel as in fl8.54p . 
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Lemma 8.20. Let fi G ACiQp((0,1); |-K)) he satisfying the continuity equation 

with reaction fl8.59p governed by the field {v,w) G x {a,b),fii) for every [a,b] C 

(0,1). Iff G Lip[Q(,((0, 1);C6(M'^)) satisfies conditions {i,ii) of Theorem \8.1tA then for all 
0<a<b<lwe have 


'R'^x (a,6) 


((dtfy + V + fw') d/x/ > / ftdfib- fa d/ifl 
V ' -/Tcxi -/rod 


> 


ls.‘^x{a,b) 


(dtf)- + V + d/i/, 


(8.85) 


where {dtf)^, {^tf)- o,re defined in terms of a space convolution kernel as in 08.711) . 

Proof. We fix a compact subinterval [a, b] C (0, l),b' G (6,1), and set M := maxtgja^;,/] 
and L := Lip(^;M'^ x [a, 5']) + sup]Rdx[ay] |^|. 

We regularize f by space convolution as in 08.541) by setting f^ '■= f * and perform 
a further regularization in time, viz. 

’^(^) ■=- f dA 0 < r < 6' - fe. (8.86) 

Jo 

Since f^^'^ G C^(M'^ x [a, &]), we can argue as in the proof of Theorem 18.171 and obtain, for 
every e > 0 and r G (0, b'—b), the identity 




+ + d/x,. 


(8.87) 


We first pass to the limit as r J, 0, observing that f^^ uniformly because f^^ is bounded 

and Lipschitz. Similarly, since = (D,^^)"^ and is bounded and Lipschitz, we have 

J 3 ^£,T uniformly. Finally, using 


dtfnx) = -if 

T 


t-\-T 


ix)-ftix)) = 


-ift+rW) - ft{x'))Kfix - X') dx', 


and the fact that N := {t E (0,1) : ^‘^{{x G 'Mfi : {x,t) ^ D{dtf)}) > 0} is =Sf^-negligible 
by the theorems of Rademacher and Fubini, an application of Lebesgue’s Dominated 
Convergence Theorem yields 


Mmdtffi^ix) = dtflix) = Hdtf) * Ke)(x) for every x G t G (a, h) \ N. (8.88) 


Since x iV is also /xj-negligible, a further application of Lebesgue’s Dominated 
Convergence Theorem yields 



(8.89) 


Now, fl8.85p will be deduced by passing to the limit e 0 in 08.891) as follows. We observe 
that f^ converges uniformly to f because f is bounded and Lipschitz. Moreover, since 
linieio DsiCt ( 2 ^) = ^xffix) at every point x E where ft is strictly differentiable, we 
obtain 


< L\v\ E L^(R‘’* X (a,5);/x/) and limDj,^'^ = D 3 .(^ fij-a.e. in Mfi x [a,b], 

eiO 
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so that 


lim / f, d/ia,fe = / ^a,6 d/ia,6, / ( ) d/i/ = / 

eto jRd ’ JK'* 7 ^ J 

K‘^x(a,6) E‘*x(a,6) 

Finally, since dt^f is also uniformly bounded, Fatou’s Lemma yields 


V + ^v?j dfij. 


lim sup J dt^ldni< J d^, liminf J dt^l d^ > J d/i/. 


e4,0 

Thus, 


R‘*x(a,b) R'^x(a,6) 

follows from 


R‘*x(a,6) 


R'^x (a,t)) 


□ 


8.7 Contraction properties: convolution and Heat equation in 
RCD(0,oo) metric-measure spaces. 

We conclude this paper with a few applications concerning contraction properties of the 
|-K distance. The hrst one concerns the behavior with respect 1-Lipschitz maps. 

Lemma 8.21. Let (W, dx), (R dy) be separable metric spaces and let f : X ^ Y be a 
1-Lipschitz map. Then /jj : M(X) —)■ M(y) is 1-Lipschitz w.r.t. hK.' 

H<(/tt/^i,/tt/^2) < H<(/^i,/^2)- (8.90) 

Proof. It is sufficient to observe that the map f : £x t *^y defined by f([a:, ?"]):= [f{x), s] 
satisfies d£y(f([a;i,ri]),f([a; 2 ,r 2 ])) < d£^([a;i, ri], [xa, r 2 ]) for every [xi,ri\ e Hx- Thus fjj is a 
contraction from (T 2 (<ix), ) to (T 2 (<iy), Wd^.^), and hence /jj satisfies fl8.90p . □ 

A second application concerns convolutions in 

Theorem 8.22. Let X ='Mf' with the Euclidean distance and let v G M(R'^). Then the 
map fi* u is contractive w.r.t. hK if = 1 and, more generally, 

/i 2 * z/) < z^(R‘^)l-K^(/ii,//a) for every /ii,/i 2 G M(R'^). (8.91) 

Proof. The previous lemma shows that hK is invariant by isometries, in particular trans¬ 
lations in R'^, so that 

hK(/ii * 6x, /i 2 * dx) = hK(/ii, /ia) for every /ii, //a G M(R'^), x G R'^. 

By the subadditivity property it follows that if z/ = J2k for some > 0, then 
H<2(/ri * V, /ia * z/) = hK^ (E (lkf^2 * ) 

k k 

< ^afchK^(//i *(5„^,/i2*(5xJ = 5]]afcH<^(/ii,/i2) = z^(R'^)hK^(/ii,//a). 

k k 

The general case then follows by approximating z/ by a sequence of discrete measure z/„ 
converging to z/ in M(R'^) and observing that fii * Un ^ fii * v weakly in M(R'^). Since hK 
is weakly continuous we obtain 08.911) . □ 

An easy application of the previous result is the contraction property of the (adjoint) 
Heat semigroup {Pf)t>o in R"^ with respect to hK. In fact, we can prove a much more 
general result for the Heat flow in RCD(0, cxo) metric measure spaces (X, d,m) [ll[5]. It 
covers the case of the semigroups {Pt)t>o generated by 
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(A) the Heat equation on a open convex domain H C with homogeneous Neumann 
conditions 

dtu = Au in H X (0, oo), dnU = 0 on dil x (0, oo), 

(B) the Heat equation on a complete Riemannian manifold (M'^, g) with nonnegative 
Ricci curvature dehned by 

dtU = AgU in M'^ x (0, oo), 
where Ag is the usual Laplace-Beltrami operator, and 

(C) the Fokker-Planck equation in generated by the gradient of a convex potentials 

V : R, viz. 

dtU = AgU — V ■ (uDV) in X (0, oo). 

Theorem 8.23. Let {X,d,m) be a complete and separable metric-measure space with 
nonnegative Riemannian Ricci Curvature, i.e. satisfying the RCD(0, oo) condition, and 
let {Pf)t>o : M(A) —)■ M(A) be the Heat semigroup in the measure setting. Then 

P//i 2 ) < H<(pi,/i 2 ) for all pii, pi 2 ^ and t > t). (8.92) 


Proof. Recall that in RCD(0, oo) metric measure spaces the L^-gradient flow of the 
Cheeger energy induces a symmetric Markov semigroup {Pt)t>o in which has 

a pointwise version satisfying the Feller regularization property Pt{Bh{X)) C Lip^(A) for 
t > 0 and the estimate 

\B>xPtf\‘^{x) < Pt{\Bxf\‘^){x) for every / G Lipb(A), x e X, t > 0. (8.93) 

Its adjoint {Pf)t>o coincides with the Kantorovich-Wasserstein gradient flow in ‘iP 2 {X) of 
the Entropy Functional ^(-Im) where iX is induced by F{s) = Ui{s) = slogs — s -|- 1 
and dehnes a semigroup in M(A) by the formula 


/d(P//i) = / Pi/d/i for every / G B6(A) and p G M(A). 


(8.94) 


In order to prove 08.921) we use O8.50p (RCD-spaces satisfy the length property) and apply 
Pi to a subsolution ('0e)6ig[o,i] in C^([0, l];Lipi,(A)) of the Hamilton-Jacobi equation 


de'ipe + ^Px'ipel'^ +'ipe <0 in X x (0,1). 


(8.95) 


Since Pt is a linear and continuous map from Lip^(X) to Lipf,(X) the curve 9 i-G '■= 
Pt{i>e) belongs to C^([0,1]; Lipi,(X)). Now, 


and the Markov property yield 
\DxPtH\x) < Pt{\BxH^){x), {Ptfjgf{x)<Pt{iJ^g){x) ioTxeX, 0G[O,1], t>0. 
Thus, for every t > 0 we obtain 


dei^e,t + - \BxfJe,t\‘^ + '^e,t <0 in X x (0,1), 
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and therefore 




ipo<l{P^)^o= / PtV^id/ii 


PiV^od/io < l-K^(/ii,/io). 


'X 


' X 


>x 


lx 


We conclude by taking the supremum with respect to all the subsolutions of 08.951) in 
C^([0,1]; Lip^(X)) and applying 08.501) . □ 


A On the chronological development of our theory 

In this section we give a brief account of the order in which we developed the different 
parts of the theory. The beginning was the mostly formal work in IZH on reaction-diffusion 
systems, where a distance on vectors u of densities over a domain hi C was formally 
dehned in the Benamou-Brenier sense via 


d{uo,UiY = inf / St : 

Jo Jn 


-I- • IK 

react {ut)^t^xdt 


under the constraint of the continuity equation dtUt+V- (Mdifr(n 4 )S 4 ) = ]Kreact(w)^t- The 
central question was and still is the understanding of diffusion equations with reactions in 

the gradient-flow form dtu = V ■ (Mdifr('u)V5T('u) j — Kreact('it)<5T('u), see [271 Sect. 5.1]. 

It was natural to treat the scalar case hrst and to restrict to the case where both 
mobility operator Mdifr(M) and Kreact(M) are linear in u. Only in that case the formally 
derived system III.291) for the geodesics {ut, it) decouples in the sense that it solves an 
Hamilton-Jacobi equation that does not depend on u. Choosing Mdifr(M) = au and 
IKreact(w) = (3u with a, (3 > 0, the relevant Hamilton-Jacobi equation reads 


a , 


dtit + -^\JJxit 


? + f 5? = 0. 


As in the other parts of this paper, we restrict to the case a = 1 and /3 = 4 subsequently, 
but refer to izu for the general case. Thus, the conjectured characterization fl8.50p was 
hrst presented in Pisa at the Workshop “Optimal Transportation and Applications” in 
November 2012. 

During a visit of the second author in Pavia, the generalized Hopf-Lax formula via 
the nonlinear convolution J^t (cf. fl8.45p l was derived via the classical method of char¬ 
acteristics. This led to the unsymmetric representation 01.261) for hK. To symmetrize 
this relation we used that 33^ii{x) = inf \y—x\) ^{z,R) = |(l—where 

A{R) = cos^ (PA(7r/2)). Setting= ~‘Jio and-^i = 2^i = 2i^i, we have the equivalence 


6 = ^1^0 


(l-'0o(a;o))(l-'0i(a;i)) > A{\xo-Xi\) for all x*. 


Setting ipi = — log(l—'0j) we arrived at the cost function 


c(xo,xi) = 


CXO 


for |xo—xi| < 71/2, 
otherwise. 


for the hrst time and obtained the characterization 01.7p . namely 

|-K(/io,/ii)^ = D(/io,/ii) = sup 7^11/^0, Ail) : <A’o©7^i<c}. 
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It was then easy to dualize and the Logarithmic Entropy functional LET in fll.20p was 
derived in July 2013. 

While the existence of minimizers for LET(yU,o,/ii) = min (yl/io,/ii) was easily ob¬ 
tained, it was not clear at all, why and how LK dehned via U<^(yU.o,/ii) = min hi) 
generates a geodesic distance. The only thing which could easily be checked was that the 
minimum was consistent with the distance between two Dirac masses, which could easily 
be calculated via the dynamic formulation. 

So, in parallel we tried to develop the dynamic approach, which was not too successful 
at the early stages. Only after realizing and exploiting the connection to the cone distance 
in Summer and Autumn of 2013 we were able to connect LET systematically with the 
dynamic approach. The crucial and surprising observation was that optimal plans for 
S and lifts of measures n G M(W) to measures A on the cone (£ could be identihed by 
exploiting the optimality conditions systematically. Corresponding results were presented 
in workshops on Optimal Transport in Banff (June 2014) and Pisa (November 2014). 

Already at the Banff workshop, the general structure of the primal and dual Entropy- 
Transport problem as well as the homogeneous perspective formulation were presented. 
Several examples and rehnements where developed afterwards. The most recent part from 
Summer 2015 concerns our Hamilton-Jacobi equation in general metric spaces {X, d) and 
the induced cone € (cf. Section [83j) and the derivation of the geodesic equations in (cf. 
Section [HS]). This last achievement now closes the circle, by showing that all the initial 
steps, which were done on a formal level in 2012 and the hrst half of 2013, have indeed a 
rigorous interpretation. 
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